0% found this document useful (0 votes)
2 views

Ontology for Data Analytics

This chapter discusses the importance of ontologies in enhancing data analytics systems, which are crucial for processing and extracting meaningful information from large datasets across various industries. It outlines the stages of data analytics, the types of analytics (descriptive, diagnostic, predictive, prescriptive), and how ontologies can facilitate data integration, improve data quality, and support analysis. Additionally, the chapter highlights the applications of ontologies in business intelligence and other fields, emphasizing their role in structuring information and enabling effective decision-making.

Uploaded by

ABHINANDAN KUMAR
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Ontology for Data Analytics

This chapter discusses the importance of ontologies in enhancing data analytics systems, which are crucial for processing and extracting meaningful information from large datasets across various industries. It outlines the stages of data analytics, the types of analytics (descriptive, diagnostic, predictive, prescriptive), and how ontologies can facilitate data integration, improve data quality, and support analysis. Additionally, the chapter highlights the applications of ontologies in business intelligence and other fields, emphasizing their role in structuring information and enabling effective decision-making.

Uploaded by

ABHINANDAN KUMAR
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Chapter 6

Ontology for Data Analytics

Fatmana Şentürk

Abstract Nowadays, with the development of technology, storage, processing, and


information extraction of data have become important. Thus, while a system is
generated, it should be designed in such a way that works in harmony with each
of these data processing steps. Data analytics is one of the methods used to develop
such a system. Data analytics applications are used in many different areas such as
increasing market shares of a firm, customer behavior analysis, predicting the life of
an electronic device, detection of the anomaly on a network, social network analysis,
healthcare systems, chemical component interactions, and bank operations. These
data analytics applications can obtain data from different sources, and these sources
must interact with each other. It is not always easy to design this interactive
architecture. These difficulties can be overcome by using ontologies. For data
analytics, ontologies can be used for facilitating data collection, improving the
quality of the data used, analyzing data, showing the obtained results, and ensuring
the reusability of the designed system. In this study, we introduce an overview of
data analytics and explain data analytics steps; in addition, we seek to answer the
question of how to enrich and improve a data analytics system by using ontologies.
We give different examples of how to use ontologies in different steps in the
systems. Moreover, we emphasize the pros and cons of using ontologies in data
analytics. We then discuss the future outlook for these ontologies for data analytics.

Keywords Data analytics · Data analytics processes · Semantic Web · Ontologies ·


Ontology-based data analytics

F. Şentürk (*)
Engineering Faculty, Computer Engineering Department, Pamukkale University, Denizli,
Turkey
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 107
S. Jain, S. Murugesan (eds.), Smart Connected World,
https://doi.org/10.1007/978-3-030-76387-9_6
108 F. Şentürk

Key Points
In this chapter you will learn:
• The definition of data analytics
• The subprocesses of data analytics
• The importance of ontologies for data analytics
• How to use ontologies for data analytics
• Examples of data analytics using ontologies for different domains
• The pros and cons of using ontologies for data analytics
• How ontologies can be used for data analytics in the future

6.1 What Is Data Analytics?

Nowadays, we are faced with rapidly increasing data sizes with the development of
technology. Storing, processing, and making meaning of these data is very important
in terms of information technologies. Especially extracting meaningful information
from these data is one of the most basic features that should be obtained quickly and
accurately. Data analytics methods are used in order to provide these features and to
obtain meaningful information.
Data analytics is a process that enables the analysis of raw data to find potential
hidden trends and to extract information through the methods it uses. Data analytics
techniques enable organizations to make more informed decisions to improve
themselves. For example, in commercial industries, data analytics is widely used
for purposes such as increasing the market share of companies, modeling customer
behavior, and estimating the life span of an electronic component. Data analytics
techniques are used not only by commercial companies but also by different
industries. There are many applications of data analytics in the health field, in the
estimation of chemical component interactions, and in the banking sector.

6.1.1 Types of Data Analytics

Data analytics is a very broad concept that covers a variety of data analyses. It allows
the processing of data in any format, and can be used for purposes such as data
changes over time, finding the source of a problem, and customer churn analysis.
Considering these purposes, there are four different types: descriptive, diagnostic,
predictive, and prescriptive analytics. Different methods are applied for each of these
four different types of objectives.
6 Ontology for Data Analytics 109

6.1.1.1 Descriptive Analytics

Descriptive analytics is a type of data analytics that is used to find solutions to


questions about the process of changing data over a period of time. Descriptive
analytics aims to present an understandable summary view required to extract
information from or present data to be used in a decision-making process for
business intelligence. Company reports produced for purposes such as the amount
of increase or decrease in the sales of a company, financial status, and continuity of
its customers can be given as examples of descriptive analytics. In addition, descrip-
tive analytics methods are used to find answers to questions such as “By how much
has the sales amount increased in the last month?”, “How loyal are the company
customers?”, and “How much has the market share of the firm increased or
decreased in the last 1 year?” For descriptive analytics, statistical tools such as
averages, percentage changes, and simple mathematical and arithmetic operations
are used.

6.1.1.2 Diagnostic Analytics

Diagnostic analytics are methods used to answer questions to determine the cause of
a problem. These methods are used in conjunction with descriptive analytics
methods. That is, diagnostic analytics methods are used to deal with the reasons
for inferences obtained with descriptive analytics. For example, when a decline in a
company’s air conditioner sales is detected, diagnostic analytics are used to deter-
mine whether the decline is due to a seasonal transition. For diagnostic analytics,
techniques such as data discovery, drill down, data mining, and correlation are used.

6.1.1.3 Predictive Analytics

Predictive analytics are methods used to predict future conditions. These methods
use historical data for predicting trends and determining whether these trends are
likely to recur. For example, the predictive analytics method is used to search for an
answer to questions such as “What percentage of sales will be this summer?” When
searching for answers to these questions, a variety of methods are used for predictive
analytics, such as neural networks, decision trees, and regression analysis.

6.1.1.4 Prescriptive Analytics

Prescriptive analytics offers different solutions to improve the path followed in the
execution of a process. In this method, past events are analyzed and an attempt is
made to predict the probability of realization of different results. For example, if a
company predicts that sales will decrease in the summer period, prescriptive analyt-
ics can determine what methods to follow to prevent this decrease. Combinations of
110 F. Şentürk

Fig. 6.1 Steps of KDD (Tsai et al., 2015)

techniques such as business rules, business-specific algorithms, machine learning,


and computational modeling methods are used for prescriptive analytics.

6.1.2 Processes of Data Analytics

Data analytics generally consists of three stages—data acquisition (data collection


and selection), preprocessing and transformation, and analysis—which are given in
Fig. 6.1. Selection, preprocessing, transformation, data mining, and interpretation or
evaluation processes are also used for knowledge discovery in databases (KDD)
(Fayyad et al., 1996). KDD means analyzing the data stored in databases and
extracting previously unknown information stored in the data. Data analytics
includes the process of extracting hidden information from existing data. In addition,
data mining techniques used for KDD are also implemented for certain problems for
which data analytics find solutions. In summary, data analytics and KDD are alike in
both the similarity of the process followed for KDD and the use of the techniques
used in KDD. Considering all these similarities, data analytics can be thought of as
KDD (Tsai et al., 2015).
The first stage in data analytics is that of collecting data and making it ready for
processing and analysis. Data is obtained from different sources as a result of
developing technological infrastructures and easier access to data. For example, a
company collects and processes many types of data, such as the products that its
customers have reviewed, the duration of the customer’s review of that product, and
whether the customer has purchased that product or not. The company may also
collect data from platforms such as social media or the Internet, and the collected
data is used for data analysis. Therefore, this company has more than one source
from which to obtain data, and these resources should be brought into a regular
format. For this reason, the useful parts of the collected data must first be selected
and then converted into a specific format for analysis. If the same company carries
6 Ontology for Data Analytics 111

out a comparison based on the product review times of the customers, the customers
should keep the time they spend on that product in a single type such as seconds or
minutes.
After the data is collected, selected, and transformed, the next step is its analysis.
In the data analysis phase, a process is carried out according to the decisions that
companies should take in order to progress. That is, if the company wants to forecast
the number of sales the following year, the predictive analytics method is applied for
data analytics. For this purpose, the collected data by the company is analyzed by
applying statistical methods, artificial neural networks, or machine learning-based
algorithms.
The last step is the interpretation of the data obtained as a result of the analysis.
During the interpretation step, the numerical values obtained are supported by
graphics and figures and transformed into a form that makes them easier for
end-users to understand. In addition, determining the accuracy of the inferences
obtained as a result of the analysis is handled in this step. The reasons for the
decrease in the sales rates of the company are obtained with diagnostic analytics
methods, and later the comparison of the accuracy of the obtained reasons is
evaluated in this step.
Data analytics has a wide perspective that can be used in many areas. Therefore,
different methods are needed to transform data from different sources into a pro-
cessable format, operating the proper algorithms for the researched information and
showing the result of the analysis. One of these methods is ontologies. The following
sections discuss the correlation between data analytics and ontologies.

6.2 What are Ontologies?

Today’s technologies can process different types of data such as images, sound,
video, and text. The heterogeneous structure of these data should be transformed into
a structure that can be processed by the computer. This structure can be provided via
the Semantic Web.
The Semantic Web is defined as a structure that aims to interpret Internet pages by
computers like a human and develop computers that will understand people’s
requests (Berners-Lee & Fischetti, 2001). Through the Semantic Web, information
is transformed into a format that can be processed not only by humans but also by
computers. Metadata models are defined so that computers can automatically detect
and process data. These metadata model definitions are provided via ontologies.
Ontologies are metadata in which the concepts specific to a domain, the relation-
ships between these concepts, and the instances of the concepts are defined together.
The relationships between concepts and beings in the real world also represent
semantic connections between these entities. In other words, an ontology is a
collection of data items that helps in storing and representing data in a way that
preserves its patterns and the semantic relationship between the items (Malik & Jain,
112 F. Şentürk

2021). Ontologies have a wide range of uses, and for this reason various ontologies
are created automatically or semiautomatically for many fields.
Ontologies provide a framework for data integration from heterogeneous sources.
This framework can be used in many areas, such as data representation, information
extraction and combining information, information management, database integra-
tion, data transformation, natural language processing, digital libraries, geographic
information systems, visual information access, and multi-agent systems (Kolli,
2008).

6.2.1 Ontologies and Their Applications

Since ontologies can convert data from heterogeneous sources into a standard
format, they can be used in many areas, such as data representation, information
finding and combining, information management, database integration, data trans-
formation, natural language processing, digital libraries, geographic information
systems, visual information access, and multi-agent systems.
Ontologies can store the desired information in structures called triples, thanks to
the RDF/RDFS/OWL language structure they use. The triples consist of an arrange-
ment called a subject-predicate-object. With these triple structures, both site-specific
rules and restrictions can be defined, and area-specific instances selected from the

Fig. 6.2 An example of a triple view of family relationship ontology


6 Ontology for Data Analytics 113

relevant area can be stored. Figure 6.2 presents an example of a triple view of an
ontology in which a person’s family relationships are defined.
Ontologies can also store different types of data obtained from different sources,
though they use the metadata language. The structural and semantic differences of
these obtained data make processing these data difficult for information systems.
These heterogeneous systems should be able to work together and provide data
integration between them. Ontologies can be used to successfully operate this
interaction between systems and to eliminate the problem of heterogeneity. For
example, for a system using a combination of video, signal, and textual data, etc.,
converting the collected data into an appropriate format and using this data as input
for another system can be achieved with ontologies. In short, an ontology combines
schema specifications with data to represent information (Mehla & Jain, 2020).
In addition to the information storage capacity of ontologies, they also store
specially defined rules related to the area in which they are defined. Using these
rules, inferences can be made on the existing information through special queries.
That is, by using the rules and data defined in an ontology, it is possible to obtain
information that is not included in the ontology. Information extraction can be
achieved by using the inference mechanism of ontologies.

6.2.2 Ontology and Data Analytics

In data analytics, there are sub-steps such as obtaining data, converting the data into
a processable form, analyzing the data, and explaining the obtained analysis. Ontol-
ogies can be used at each stage of these sub-steps.
Ontology-based architectures have been developed for data analytics. Ontologies
in these architectures can be included in any part of data analytics systems. Ontol-
ogies can be used in the process of collecting data, preprocessing data, storing data,
and enhancing data quality. For example, during the data collection step, data can be
retrieved in a formatted manner by using domain-specific constraints provided by the
ontologies. Before analyzing the data, by using ontologies, synonyms can be added
to the data set to increase the data quality or enrich the data in a preprocessing step.
Subcomponents that make up the relevant part of the data obtained can be included
in the data set through ontologies, in case of data deficiency.
In the data processing step, ontologies can be used to reduce the search space.
Infrastructure can be created for rule-based data mining algorithms through the
inference mechanism provided by ontologies. For example, ontologies can be used
for the analysis of human resources. The attributes specific to the human resource
sought for the job, including the naming of these attributes, can also be stored in an
ontology. While screening people who possess these qualities, the qualities that are
defined in the ontology and that can replace each other are taken into consideration.
In addition, the inferences obtained from data mining algorithms can also be
enriched through ontologies. Different graphics can be obtained by expanding the
analysis results through ontologies. A flexible and analytical visualization environ-
ment can be provided if the applications that enable the visualization of analysis
114 F. Şentürk

results are combined with ontology approaches. It is easy to model the ontologies as
a graph. Each concept in the ontology can be shown as a vertex, and the relationships
between these concepts can be shown as edges between vertices. Therefore, ontol-
ogies obtained by modeling the analysis results can be shown as graphs with more
understandable visuality for users. Moreover, different display techniques can be
operable over these graphs and different perspectives can be developed.

6.3 Different Perspectives on Ontology and Data Analytics

Ontologies present information in a structured form and provide information extrac-


tion from existing data through inference mechanisms. For this reason, ontologies
provide a very useful infrastructure for data analytics methods. Ontologies have
become an unavoidable part of data analytics with their wide usage area. We have
examined a few different examples to make the relationship between ontologies and
data analytics more understandable in this section.

6.3.1 Business Intelligence

Business intelligence is a set of processes and architectures that examine the status of
the methods planned for a business, identify opportunities for the company, and
make the existing raw data meaningful to develop these opportunities. Companies
find new opportunities thanks to business intelligence and try to gain an advantage in
the competitive market environment by determining their effective strategies. The
general flow of business intelligence is shown in Fig. 6.3.
Ontologies are prone to natural language interaction, with their strong abstraction
features in terms of defining entities, data properties, and relationships between
entities for a specific domain. Ontologies can be used to capture patterns in the
expected workload and construct them to build a speech system (Quamar et al.,
2020).
In addition, ontologies can capture the measures and dimensions defined in the
cube for the business intelligence model by using entities, their taxonomy, or
hierarchies. These measures refer to categorical or qualitative properties that contain
one or more computable values. At the same time, ontologies can represent a higher-
level grouping of data on measures/dimensions provided by an expert or converted
data from raw data. Thanks to these superior capabilities of the ontologies, ontol-
ogies have replaced the expert opinion effect used in the traditional business
intelligence model. Domain knowledge obtained from experts can be automatically
added to the business intelligence model by using ontologies defined for the domain
without the need for these experts. Similarly, the boundaries of the data to be used in
the system can be determined by using the constraints in ontologies. Thus, experts
can focus their attention on the analysis of data and showing the results of data in the
6 Ontology for Data Analytics 115

Fig. 6.3 An architecture of traditional business intelligence (Neuböck et al., 2013)

business intelligence systems. The architecture of the use of ontologies in a business


intelligence architecture is given in Fig. 6.4.

6.3.2 Healthcare

Data analytics methods can also be used in the field of health, especially for hospitals
and clinics, for purposes such as evaluating treatment costs, improving the treatment
processes of patients, and preventing patients from having to return to the hospital. A
data analytics system that supports doctors can be developed by evaluating various
patient information such as MR images, epicrisis notes, patient complaints, and
symptoms through the developed ontology-based systems. For example, we assume
that an ontology that stores abnormal regions in MR images is integrated into the MR
imaging system. For the abnormal region in MR images, these sections can be
defined by using properties such as width, length, and diameter in the ontology.
Possible abnormal region determinations can be made using defined ontology on an
image belonging to the patient. In addition, these data analytics systems can analyze
116 F. Şentürk

Fig. 6.4 An architecture of integrating ontologies into business intelligence applications (Neuböck
et al., 2013)

the complaints for which patients are returning to the hospital, in order to prevent
rehospitalization or return of patients to the emergency room. The cost-effectiveness
of various procedures and treatments applied in hospitals can also be evaluated with
semantic data analysis methods.
Ontologies are also used for biological structures and drug structures. The most
comprehensive examples of these are gene ontology and Foundational Model of
Anatomy (FMA) ontology. In FMA (Rosse & Mejino, 2003) ontology, there are
classes of 75,000 different anatomical terms and 130,000 relationships belonging to
these species. In the FMA ontology, it describes all the major parts of the body,
starting with the smallest gene structures, especially of humans. Using these defini-
tions, it can be predicted how any drug or treatment developed can cause a reaction
without testing on humans.
To give another example, for a newly developed drug it may be possible for the
developer company to predict drug interactions that may occur without conducting
any human experiments with data analytics. For this process, first the active
6 Ontology for Data Analytics 117

ingredient of the drug should be checked and it should be determined whether there
are similar available drugs. Possible effects can be predicted by assuming that drugs
with similar active ingredients will show similar side effects. In addition, knowing
the active substances that have a negative effect on each other will enable us to
obtain information on whether the patient will interact with any drug or chemical
used. It may be possible to use data analytics methods for all these calculations. Data
sets required for data analytics steps can be provided by defined drug ontologies such
as SNOMED1 and DRON2 (the drug ontology).
Considering that the development of gene ontology will continue, in the future,
data analytics will contribute to the realization of aims such as developing custom-
ized drugs for each person, applying personalized treatment methods, and improving
living conditions depending on a person’s habits. For example, when an individual’s
nutritional habits are analyzed, the risk of suffering any disease can be calculated.
When this risk factor is combined with the person’s gene map, the rate of occurrence
of this disease can be calculated. Furthermore, the steps to be taken to prevent the
disease can be determined by using this result. Measures can be taken for living
conditions and nutritional habits that create individual risk factors. While taking
these measures, individuals can create their own conditions with the help of smart
Semantic Web applications, instead of being given strict lists. Thus, people can be
made aware of alternative nutrition methods or living conditions that can replace
existing ones.

6.3.3 Information Retrieval and Ontology

Information retrieval involves obtaining information that includes “query formatting


and analyzing,” “information and documents indexing,” and “projecting the
retrieved information” subprocesses to meet an information need in large data
collations. Information retrieval is used specifically for the analysis of a document
or retrieval of a document. With this information retrieval, specific queries are made
on the documents and the appropriate results are obtained for these queries.
Information retrieval, which is widely used in search engines, has four subcom-
ponents: indexing, query processing, searching, and ranking. In the indexing step,
generally, extraction of the attributes of the information gathered from various
sources, sorting, and storing the descriptive features according to a special format
are performed. In the query processing step, the word or phrase to be searched is
parsed and converted into suitable objects for searching among the indexed features.
In the searching step, the features of searched words are scanned in the previously

1
SNOMED Ontology, https://bioportal.bioontology.org/ontologies/SNOMEDCT, last Accessed:
November 30, 2020.
2
DRON, Drug Ontology, https://bioportal.bioontology.org/ontologies/DRON, last Accessed:
November 30, 2020.
118 F. Şentürk

indexed attributes. The ranking step calculates the relevance of the word or phrase
searched to the information obtained from the search. In other words, the relationship
between the documents obtained as a result of the search and the search query is
calculated and the documents are displayed to the end-user according to this
calculation.
These four-step architectures of search engines can be transformed into smart
systems that can carry out intelligence searches by using Semantic Web and ontol-
ogies. Search engines can develop data concepts to collect and access similar data
from the Internet and the Web. At the same time, ontologies can be used to find
semantic similarities of the collected information and to enrich data. While search
engines make use of the synonyms stored in the ontologies for indexing steps, they
also use a similar feature for query processing. Inference mechanisms and synony-
mous storage capabilities of ontologies can also be used when transforming the
searched words or phrases into the query. In addition to the word or phrase entered
by the end-user, similar search expressions can also be added for search by using
ontologies. That is, the synonyms of the word entered by the end-user are found by
using ontologies, and these found synonyms can be searched together with the
originally entered word or words. Thus, a correct and widened flow of information
can be provided to the end-user. In addition, ontology matching approaches can be
used to establish the links between the query and the search result at this stage.
Moreover, during the ranking process, similar data concepts, using the data model
generated to store the data, can be shown to users.

6.3.4 Social Network Analysis

Nowadays, most people can share their feelings and thoughts through social media
applications. With the widespread use of social media, people’s likes, positive or
negative comments about a film or product, and their interactions with each other
have become important. Recently, feedback shared on social media applications has
also attracted the attention of companies. For example, in the past, when there was no
use of social media, it took a certain amount of time for a company to collect user
reviews for a new product. Now, people are using the company’s product and then
sharing their positive or negative opinions instantly on social media. The company
can see these posts seconds later. This information shared by users via social media
on their own has become a fast source of data for companies.
Comments and feedback given by users are among the features that companies
can use in their decision-making structures. Many companies include customer
feedback in their business intelligence applications. For example, a phone company
may consider negative customer comments for the previous model for a new model
phone it is about to release. Thus, they can decide which of the parts, such as screen,
battery, camera, or processor, will be highlighted for the new model they are
developing. Companies can obtain these customer comments from the complaint
logs or fault records they receive, as well as from comments made on social
6 Ontology for Data Analytics 119

networks. They can handle both the comments they collect from social media and the
user comments coming to the company and then apply data analytics methods to
these comments. Thus, the company can identify the deficiencies related to that
product and make improvements for the new product they are releasing.
In addition, specific surveys can be conducted on social media to learn about the
general structure of societies. Social networks are applications that people from
throughout society can easily access. With its widespread use, any information can
spread to the public in a short time. This information can be a survey or news. For
example, companies that want to use the speed of social media carry out marketing
activities for their newly developed products through social networks. Furthermore,
companies benefit from the power of social media to determine their potential
customers and to ensure the continuity of their customers.
Some of the methods used in the analysis of social network data are data analytics
methods. Especially for applications where user comments are analyzed, using data
analytics methods together with ontologies will provide more accurate results. For
example, although the abbreviations used in the user comments can be different
according to society, this reveals the necessity to evaluate these abbreviations used
semantically. Language-based differences and restrictions can be supported through
ontologies, and thus it may be possible to evaluate user comments semantically. In
addition, this will ensure that the existing language support provided for ontologies
and the user comments in different languages are included in the analysis.
Companies can advertise their own products through social media. Searches
made via social media applications are evaluated to get an idea about a product,
and products produced by other companies similar to the searched product are shown
in the application as an advertisement. In this way, companies get the opportunity to
market their own products. Similarly, information such as pages, products, and links
that comes to us as suggestions is also produced using data analytics methods. Every
word we have shared is included in the data analytics process. Social media
applications extract attributes of their own users and find similar users using these
attributes. These applications offer us pages and products liked by users with similar
profiles to ours in the form of advertisements. In this profile extraction process,
ontologies can be used to store instances and to calculate similarities of the profiles
by interoperating with data analytics methods. Thanks to the instance storage
potential of ontologies, instances in ontologies can be used as sample data in analysis
processes.

6.4 The Present and Future of Semantic Data Analytics

Rapidly increasing data size and increasing commercial competition have led many
companies to use data analytics methods. Data analytics techniques can determine
the methods to be followed in order to increase the market percentage of the
companies, and can also be a guide for the decisions that companies should take in
their internal processes. Data analytics techniques can be used not only by
120 F. Şentürk

companies but also for banking, healthcare, bioinformatics calculations, chemical


interaction estimation, personalization of a product, and even social interactions in a
society.
The use of data analytics techniques in many areas has led to numerous problems.
The most significant of these problems is obtaining the data required for data
analytics. In particular, there is a need for the interaction of nonuniform data
obtained from different sources and the development of a system that works in
harmony with this data. Ontologies are used together with data analytics techniques
in order to ensure cooperation between pieces of data. The use of ontologies together
with data analytics techniques has several advantages over other systems, as listed
below:
• The data to be analyzed for data analysis is complete and noiseless, which will
also increase the accuracy of the results obtained. Ontologies can be used to
complete data and to eliminate noises in the data.
• For data analytics, data obtained from different sources can interact with other
sources through ontologies. Ontologies are capable of storing any type of data
due to their nature. Moreover, for the entities defined in the ontology, different
qualifications can be defined, such as what the data type will be, in which value
range the data can be found, and the relationship of data with another entity.
• Similarity properties called synonyms can be defined for an entity in the ontology.
For example, the words “dictionary” and “glossary” in English can be defined as
synonyms. Especially for data analytics where text processing is performed, the
integration of this ability of the ontologies with the data analytics architecture will
lead to more accurate results. Using data analytics methods and ontologies
together will allow the determination of the semantic similarities of the processed
data. In addition, adding synonyms for data analytics will enrich the data set.
• Ontologies have the ability to support different languages with their flexible
structures. The equivalent of a term in a different language can be defined and
stored in the ontology. This feature allows data in different languages to be a
source for data analysis and for the processing of this data within the data
analytics system.
• Inferences can be made by applying specific rules to ontologies. With this
inference mechanism, business rules can be created or association rules of the
data can be defined. This inference mechanism can create the infrastructure of
rule-based data analytics methods.
• Ontologies are metadata that describe entities, properties of these entities, sample
data, and relationships between these properties for a specific field. The sample
data, rules, and limitations related to the domain are ready to use for the systems,
thanks to the domain information contained in the ontologies. Using data analyt-
ics methods and ontologies together facilitates the work of domain experts who
take part in the data analysis process.
It is true that ontologies contribute positively to data analytics methods. However,
there are situations that need attention and that have disadvantages. These situations
and disadvantages are listed below:
6 Ontology for Data Analytics 121

• Although the use of ontologies together with data analytics or data acquisition, or
during data integration, seems to have positive results, it may cause extra work-
load in some conditions. For example, in a data analytics architecture where only
numerical data is obtained and processed, adding ontologies to the preprocessing
step may cause extra controls and decrease the performance of the architecture.
Therefore, deciding whether to use ontologies is an important issue.
• In addition to deciding whether ontologies should be included in data analytics or
not, it is also important to determine at what stage of data analysis (data collec-
tion, data preprocessing, data analysis, or result) these should be included.
Incorrect determination of the step in which the ontology will be used may
prevent the system from working properly, or using ontology with data analytics
may not contribute to the results obtained.
• When companies or experts decide to use ontologies, calibrations should be made
on the data analytics techniques that they currently use. There will be a need for
expert personnel in the field who know the properties of ontologies well and can
use the advantages of the ontologies for making calibrations of these techniques.
To overcome this problem, either an expert in ontologies should be hired or
existing personnel should be trained, which takes time.
• The inclusion of synonyms and foreign language support in data analytics
architecture through ontologies can lead to longer analysis times.
• In addition, data security at every stage of data analytics is also an important
problem. If it fails to provide data security, the company could lose the trust of its
customers, or this may result in financial loss for the company.
The combined use of data analytics and ontologies will create new opportunities
in data management, storage, and processing (Konys, 2016). There will be more
architecture for the combined use of ontologies and data analytics techniques in the
near future. For example, it will be possible to obtain information about our health
status through our mobile phones and to direct first aid personnel automatically in
case of any emergency situation. With the applications that will enter our lives,
people’s behavior, social media shares, and phone messages will be analyzed and
people’s emotional status can be analyzed. Many products, which are standardized
today, will be transformed into a personalized form in the future through ontologies
that continue to be defined, such as gene ontologies or ontologies that will be newly
defined.

6.5 Conclusion

Nowadays, with the development of technology, data size has increased rapidly, and
data storing, data processing and extract information from data have become impor-
tant. Data analytics architectures are one of the methods used to develop a structure
that works in harmony with each of these processes. Data analytics architectures are
122 F. Şentürk

used in the decision-making processes of companies, as well as in many different


areas such as anomaly detection in a network, social network analysis, and health
systems.
Data analytics is used to find solutions to questions about the process of changing
data in a specific time period, as well as to seek an answer to the question “Why is
this problem occurring?” In addition, data analytics can be used to predict future
conditions for a decision or process. Data analytics can also improve the way a
process is run or solve a problem by using different techniques. Methods used for
data analytics vary according to the area of application or the problems identified by
the companies.
Obtaining data is one of the most important processes for data analytics. In this
step, data should be complete, faultless, and high quality even if it is collected from
different sources, in order to get correct results from data analytics. One of the
methods to achieve this is using ontologies with data analytics. For data analytics,
ontologies can be used to facilitate data collection, improve the quality of the data
used, analyze the data, show the results obtained, and ensure the reusability of the
designed system. In this study, we provided different examples of how data analytics
systems can be used with ontologies. We aimed to show the diversity of the
applications developed by showing examples from different fields. We also
explained the pros and cons of using data analytics and ontologies together. In
summary, we emphasized the importance of using ontologies with data analytics
for today’s and future technologies.
Review Questions
1. What is data analytics? Why do we need data analytics applications?
2. What are the types of data analytics? Explain the uses of each type.
3. Indicate the steps of data analytics processes and briefly describe the operations
performed in each step.
4. What is an ontology? What are the facilities provided by ontologies?
5. Explain the relationship between data analytics and ontologies.
Discussion Questions
1. How can the Semantic Web be included in the stages of data analytics?
2. Which types of problems can you encounter while developing data analytics
applications? How can you eliminate these problems by taking advantage of the
Semantic Web?
3. How can we integrate user experiences or comments into companies’ business
intelligence applications? How does this integration process contribute to busi-
ness intelligence application?
4. Discuss whether search engines can retrieve personalized results by using
ontology-based data analytics architectures.
Problem Statements for Young Researchers
1. What kinds of applications come to mind when you think of ontology-based data
analytics architectures? Can these applications be smarter than classic data
analytics methods?
6 Ontology for Data Analytics 123

2. Might the obtained results of ontology-based data analytics methods be different


from the results of obtained classic data analytics methods? If you want to change
results, how do you integrate ontologies in your data analytics architecture?
3. What are the difficulties in developing ontology-based data analytics applica-
tions? How can these difficulties be overcome?
4. Can ontology-based data analytics applications be used for security? If such an
application is designed, in which step/steps can ontologies be utilized?

References

Berners-Lee, T., & Fischetti, M. (2001). Weaving the web: The original design and ultimate destiny
of the World Wide Web by its inventor. DIANE Publishing Company.
Fayyad, U. M., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge
discovery in databases. AI Magazine, 17(3), 37–54.
Kolli, R. (2008). Scalable matching of ontology graphs using partitioning. Doctoral dissertation,
University of Georgia.
Konys, A. (2016, October). Ontology-based approaches to big data analytics. In International
Multi-conference on Advanced Computer Systems (pp. 355–365). Cham: Springer.
Malik, S., & Jain, S. (2021, February). Semantic ontology-based approach to enhance text classi-
fication. In International Semantic Intelligence Conference, Delhi, India. 25–27 Feb 2021.
CEUR Workshop Proceedings (Vol. 2786, pp. 85–98). Retrieved from http://ceur-ws.org/Vol-
2786/Paper16.pdf
Mehla, S., & Jain, S. (2020). An ontology supported hybrid approach for recommendation in
emergency situations. Annals of Telecommunications, 75(7), 421–435.
Neuböck, T., Neumayr, B., Schrefl, M., & Schütz, C. (2014). Ontology-driven business intelligence
for comparative data analysis. In E. Zimányi (Eds.), Business intelligence. eBISS 2013. Lecture
Notes in Business Information Processing (Vol. 172). Cham: Springer. https://doi.org/10.1007/
978-3-319-05461-2_3
Rosse, C., & Mejino, J. L., Jr. (2003). A reference ontology for biomedical informatics: The
foundational model of anatomy. Journal of Biomedical Informatics, 36(6), 478–500.
Quamar, A., Özcan, F., Miller, D., Moore, R. J., Niehus, R., & Kreulen, J. (2020). Conversational
BI: An ontology-driven conversation system for business intelligence applications. Proceedings
of the VLDB Endowment, 13(12), 3369–3381.
Tsai, C. W., Lai, C. F., Chao, H. C., & Vasilakos, A. V. (2015). Big data analytics: A survey.
Journal of Big Data, 2(1), 1–32.

Fatmana Şentürk, received the Ph.D. degree in computer science from the Department of
Computer Engineering, Ege University, 2019 and the MS and BS degree in computer science
from the Department of Computer Engineering, Pamukkale University, 2012 and 2008. She is
currently a research assistant at Pamukkale University in Turkey. Her research interests include data
science, ontology matching, graph and graph algorithms.

You might also like