Ontology for Data Analytics
Ontology for Data Analytics
Fatmana Şentürk
F. Şentürk (*)
Engineering Faculty, Computer Engineering Department, Pamukkale University, Denizli,
Turkey
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 107
S. Jain, S. Murugesan (eds.), Smart Connected World,
https://doi.org/10.1007/978-3-030-76387-9_6
108 F. Şentürk
Key Points
In this chapter you will learn:
• The definition of data analytics
• The subprocesses of data analytics
• The importance of ontologies for data analytics
• How to use ontologies for data analytics
• Examples of data analytics using ontologies for different domains
• The pros and cons of using ontologies for data analytics
• How ontologies can be used for data analytics in the future
Nowadays, we are faced with rapidly increasing data sizes with the development of
technology. Storing, processing, and making meaning of these data is very important
in terms of information technologies. Especially extracting meaningful information
from these data is one of the most basic features that should be obtained quickly and
accurately. Data analytics methods are used in order to provide these features and to
obtain meaningful information.
Data analytics is a process that enables the analysis of raw data to find potential
hidden trends and to extract information through the methods it uses. Data analytics
techniques enable organizations to make more informed decisions to improve
themselves. For example, in commercial industries, data analytics is widely used
for purposes such as increasing the market share of companies, modeling customer
behavior, and estimating the life span of an electronic component. Data analytics
techniques are used not only by commercial companies but also by different
industries. There are many applications of data analytics in the health field, in the
estimation of chemical component interactions, and in the banking sector.
Data analytics is a very broad concept that covers a variety of data analyses. It allows
the processing of data in any format, and can be used for purposes such as data
changes over time, finding the source of a problem, and customer churn analysis.
Considering these purposes, there are four different types: descriptive, diagnostic,
predictive, and prescriptive analytics. Different methods are applied for each of these
four different types of objectives.
6 Ontology for Data Analytics 109
Diagnostic analytics are methods used to answer questions to determine the cause of
a problem. These methods are used in conjunction with descriptive analytics
methods. That is, diagnostic analytics methods are used to deal with the reasons
for inferences obtained with descriptive analytics. For example, when a decline in a
company’s air conditioner sales is detected, diagnostic analytics are used to deter-
mine whether the decline is due to a seasonal transition. For diagnostic analytics,
techniques such as data discovery, drill down, data mining, and correlation are used.
Predictive analytics are methods used to predict future conditions. These methods
use historical data for predicting trends and determining whether these trends are
likely to recur. For example, the predictive analytics method is used to search for an
answer to questions such as “What percentage of sales will be this summer?” When
searching for answers to these questions, a variety of methods are used for predictive
analytics, such as neural networks, decision trees, and regression analysis.
Prescriptive analytics offers different solutions to improve the path followed in the
execution of a process. In this method, past events are analyzed and an attempt is
made to predict the probability of realization of different results. For example, if a
company predicts that sales will decrease in the summer period, prescriptive analyt-
ics can determine what methods to follow to prevent this decrease. Combinations of
110 F. Şentürk
out a comparison based on the product review times of the customers, the customers
should keep the time they spend on that product in a single type such as seconds or
minutes.
After the data is collected, selected, and transformed, the next step is its analysis.
In the data analysis phase, a process is carried out according to the decisions that
companies should take in order to progress. That is, if the company wants to forecast
the number of sales the following year, the predictive analytics method is applied for
data analytics. For this purpose, the collected data by the company is analyzed by
applying statistical methods, artificial neural networks, or machine learning-based
algorithms.
The last step is the interpretation of the data obtained as a result of the analysis.
During the interpretation step, the numerical values obtained are supported by
graphics and figures and transformed into a form that makes them easier for
end-users to understand. In addition, determining the accuracy of the inferences
obtained as a result of the analysis is handled in this step. The reasons for the
decrease in the sales rates of the company are obtained with diagnostic analytics
methods, and later the comparison of the accuracy of the obtained reasons is
evaluated in this step.
Data analytics has a wide perspective that can be used in many areas. Therefore,
different methods are needed to transform data from different sources into a pro-
cessable format, operating the proper algorithms for the researched information and
showing the result of the analysis. One of these methods is ontologies. The following
sections discuss the correlation between data analytics and ontologies.
Today’s technologies can process different types of data such as images, sound,
video, and text. The heterogeneous structure of these data should be transformed into
a structure that can be processed by the computer. This structure can be provided via
the Semantic Web.
The Semantic Web is defined as a structure that aims to interpret Internet pages by
computers like a human and develop computers that will understand people’s
requests (Berners-Lee & Fischetti, 2001). Through the Semantic Web, information
is transformed into a format that can be processed not only by humans but also by
computers. Metadata models are defined so that computers can automatically detect
and process data. These metadata model definitions are provided via ontologies.
Ontologies are metadata in which the concepts specific to a domain, the relation-
ships between these concepts, and the instances of the concepts are defined together.
The relationships between concepts and beings in the real world also represent
semantic connections between these entities. In other words, an ontology is a
collection of data items that helps in storing and representing data in a way that
preserves its patterns and the semantic relationship between the items (Malik & Jain,
112 F. Şentürk
2021). Ontologies have a wide range of uses, and for this reason various ontologies
are created automatically or semiautomatically for many fields.
Ontologies provide a framework for data integration from heterogeneous sources.
This framework can be used in many areas, such as data representation, information
extraction and combining information, information management, database integra-
tion, data transformation, natural language processing, digital libraries, geographic
information systems, visual information access, and multi-agent systems (Kolli,
2008).
Since ontologies can convert data from heterogeneous sources into a standard
format, they can be used in many areas, such as data representation, information
finding and combining, information management, database integration, data trans-
formation, natural language processing, digital libraries, geographic information
systems, visual information access, and multi-agent systems.
Ontologies can store the desired information in structures called triples, thanks to
the RDF/RDFS/OWL language structure they use. The triples consist of an arrange-
ment called a subject-predicate-object. With these triple structures, both site-specific
rules and restrictions can be defined, and area-specific instances selected from the
relevant area can be stored. Figure 6.2 presents an example of a triple view of an
ontology in which a person’s family relationships are defined.
Ontologies can also store different types of data obtained from different sources,
though they use the metadata language. The structural and semantic differences of
these obtained data make processing these data difficult for information systems.
These heterogeneous systems should be able to work together and provide data
integration between them. Ontologies can be used to successfully operate this
interaction between systems and to eliminate the problem of heterogeneity. For
example, for a system using a combination of video, signal, and textual data, etc.,
converting the collected data into an appropriate format and using this data as input
for another system can be achieved with ontologies. In short, an ontology combines
schema specifications with data to represent information (Mehla & Jain, 2020).
In addition to the information storage capacity of ontologies, they also store
specially defined rules related to the area in which they are defined. Using these
rules, inferences can be made on the existing information through special queries.
That is, by using the rules and data defined in an ontology, it is possible to obtain
information that is not included in the ontology. Information extraction can be
achieved by using the inference mechanism of ontologies.
In data analytics, there are sub-steps such as obtaining data, converting the data into
a processable form, analyzing the data, and explaining the obtained analysis. Ontol-
ogies can be used at each stage of these sub-steps.
Ontology-based architectures have been developed for data analytics. Ontologies
in these architectures can be included in any part of data analytics systems. Ontol-
ogies can be used in the process of collecting data, preprocessing data, storing data,
and enhancing data quality. For example, during the data collection step, data can be
retrieved in a formatted manner by using domain-specific constraints provided by the
ontologies. Before analyzing the data, by using ontologies, synonyms can be added
to the data set to increase the data quality or enrich the data in a preprocessing step.
Subcomponents that make up the relevant part of the data obtained can be included
in the data set through ontologies, in case of data deficiency.
In the data processing step, ontologies can be used to reduce the search space.
Infrastructure can be created for rule-based data mining algorithms through the
inference mechanism provided by ontologies. For example, ontologies can be used
for the analysis of human resources. The attributes specific to the human resource
sought for the job, including the naming of these attributes, can also be stored in an
ontology. While screening people who possess these qualities, the qualities that are
defined in the ontology and that can replace each other are taken into consideration.
In addition, the inferences obtained from data mining algorithms can also be
enriched through ontologies. Different graphics can be obtained by expanding the
analysis results through ontologies. A flexible and analytical visualization environ-
ment can be provided if the applications that enable the visualization of analysis
114 F. Şentürk
results are combined with ontology approaches. It is easy to model the ontologies as
a graph. Each concept in the ontology can be shown as a vertex, and the relationships
between these concepts can be shown as edges between vertices. Therefore, ontol-
ogies obtained by modeling the analysis results can be shown as graphs with more
understandable visuality for users. Moreover, different display techniques can be
operable over these graphs and different perspectives can be developed.
Business intelligence is a set of processes and architectures that examine the status of
the methods planned for a business, identify opportunities for the company, and
make the existing raw data meaningful to develop these opportunities. Companies
find new opportunities thanks to business intelligence and try to gain an advantage in
the competitive market environment by determining their effective strategies. The
general flow of business intelligence is shown in Fig. 6.3.
Ontologies are prone to natural language interaction, with their strong abstraction
features in terms of defining entities, data properties, and relationships between
entities for a specific domain. Ontologies can be used to capture patterns in the
expected workload and construct them to build a speech system (Quamar et al.,
2020).
In addition, ontologies can capture the measures and dimensions defined in the
cube for the business intelligence model by using entities, their taxonomy, or
hierarchies. These measures refer to categorical or qualitative properties that contain
one or more computable values. At the same time, ontologies can represent a higher-
level grouping of data on measures/dimensions provided by an expert or converted
data from raw data. Thanks to these superior capabilities of the ontologies, ontol-
ogies have replaced the expert opinion effect used in the traditional business
intelligence model. Domain knowledge obtained from experts can be automatically
added to the business intelligence model by using ontologies defined for the domain
without the need for these experts. Similarly, the boundaries of the data to be used in
the system can be determined by using the constraints in ontologies. Thus, experts
can focus their attention on the analysis of data and showing the results of data in the
6 Ontology for Data Analytics 115
6.3.2 Healthcare
Data analytics methods can also be used in the field of health, especially for hospitals
and clinics, for purposes such as evaluating treatment costs, improving the treatment
processes of patients, and preventing patients from having to return to the hospital. A
data analytics system that supports doctors can be developed by evaluating various
patient information such as MR images, epicrisis notes, patient complaints, and
symptoms through the developed ontology-based systems. For example, we assume
that an ontology that stores abnormal regions in MR images is integrated into the MR
imaging system. For the abnormal region in MR images, these sections can be
defined by using properties such as width, length, and diameter in the ontology.
Possible abnormal region determinations can be made using defined ontology on an
image belonging to the patient. In addition, these data analytics systems can analyze
116 F. Şentürk
Fig. 6.4 An architecture of integrating ontologies into business intelligence applications (Neuböck
et al., 2013)
the complaints for which patients are returning to the hospital, in order to prevent
rehospitalization or return of patients to the emergency room. The cost-effectiveness
of various procedures and treatments applied in hospitals can also be evaluated with
semantic data analysis methods.
Ontologies are also used for biological structures and drug structures. The most
comprehensive examples of these are gene ontology and Foundational Model of
Anatomy (FMA) ontology. In FMA (Rosse & Mejino, 2003) ontology, there are
classes of 75,000 different anatomical terms and 130,000 relationships belonging to
these species. In the FMA ontology, it describes all the major parts of the body,
starting with the smallest gene structures, especially of humans. Using these defini-
tions, it can be predicted how any drug or treatment developed can cause a reaction
without testing on humans.
To give another example, for a newly developed drug it may be possible for the
developer company to predict drug interactions that may occur without conducting
any human experiments with data analytics. For this process, first the active
6 Ontology for Data Analytics 117
ingredient of the drug should be checked and it should be determined whether there
are similar available drugs. Possible effects can be predicted by assuming that drugs
with similar active ingredients will show similar side effects. In addition, knowing
the active substances that have a negative effect on each other will enable us to
obtain information on whether the patient will interact with any drug or chemical
used. It may be possible to use data analytics methods for all these calculations. Data
sets required for data analytics steps can be provided by defined drug ontologies such
as SNOMED1 and DRON2 (the drug ontology).
Considering that the development of gene ontology will continue, in the future,
data analytics will contribute to the realization of aims such as developing custom-
ized drugs for each person, applying personalized treatment methods, and improving
living conditions depending on a person’s habits. For example, when an individual’s
nutritional habits are analyzed, the risk of suffering any disease can be calculated.
When this risk factor is combined with the person’s gene map, the rate of occurrence
of this disease can be calculated. Furthermore, the steps to be taken to prevent the
disease can be determined by using this result. Measures can be taken for living
conditions and nutritional habits that create individual risk factors. While taking
these measures, individuals can create their own conditions with the help of smart
Semantic Web applications, instead of being given strict lists. Thus, people can be
made aware of alternative nutrition methods or living conditions that can replace
existing ones.
1
SNOMED Ontology, https://bioportal.bioontology.org/ontologies/SNOMEDCT, last Accessed:
November 30, 2020.
2
DRON, Drug Ontology, https://bioportal.bioontology.org/ontologies/DRON, last Accessed:
November 30, 2020.
118 F. Şentürk
indexed attributes. The ranking step calculates the relevance of the word or phrase
searched to the information obtained from the search. In other words, the relationship
between the documents obtained as a result of the search and the search query is
calculated and the documents are displayed to the end-user according to this
calculation.
These four-step architectures of search engines can be transformed into smart
systems that can carry out intelligence searches by using Semantic Web and ontol-
ogies. Search engines can develop data concepts to collect and access similar data
from the Internet and the Web. At the same time, ontologies can be used to find
semantic similarities of the collected information and to enrich data. While search
engines make use of the synonyms stored in the ontologies for indexing steps, they
also use a similar feature for query processing. Inference mechanisms and synony-
mous storage capabilities of ontologies can also be used when transforming the
searched words or phrases into the query. In addition to the word or phrase entered
by the end-user, similar search expressions can also be added for search by using
ontologies. That is, the synonyms of the word entered by the end-user are found by
using ontologies, and these found synonyms can be searched together with the
originally entered word or words. Thus, a correct and widened flow of information
can be provided to the end-user. In addition, ontology matching approaches can be
used to establish the links between the query and the search result at this stage.
Moreover, during the ranking process, similar data concepts, using the data model
generated to store the data, can be shown to users.
Nowadays, most people can share their feelings and thoughts through social media
applications. With the widespread use of social media, people’s likes, positive or
negative comments about a film or product, and their interactions with each other
have become important. Recently, feedback shared on social media applications has
also attracted the attention of companies. For example, in the past, when there was no
use of social media, it took a certain amount of time for a company to collect user
reviews for a new product. Now, people are using the company’s product and then
sharing their positive or negative opinions instantly on social media. The company
can see these posts seconds later. This information shared by users via social media
on their own has become a fast source of data for companies.
Comments and feedback given by users are among the features that companies
can use in their decision-making structures. Many companies include customer
feedback in their business intelligence applications. For example, a phone company
may consider negative customer comments for the previous model for a new model
phone it is about to release. Thus, they can decide which of the parts, such as screen,
battery, camera, or processor, will be highlighted for the new model they are
developing. Companies can obtain these customer comments from the complaint
logs or fault records they receive, as well as from comments made on social
6 Ontology for Data Analytics 119
networks. They can handle both the comments they collect from social media and the
user comments coming to the company and then apply data analytics methods to
these comments. Thus, the company can identify the deficiencies related to that
product and make improvements for the new product they are releasing.
In addition, specific surveys can be conducted on social media to learn about the
general structure of societies. Social networks are applications that people from
throughout society can easily access. With its widespread use, any information can
spread to the public in a short time. This information can be a survey or news. For
example, companies that want to use the speed of social media carry out marketing
activities for their newly developed products through social networks. Furthermore,
companies benefit from the power of social media to determine their potential
customers and to ensure the continuity of their customers.
Some of the methods used in the analysis of social network data are data analytics
methods. Especially for applications where user comments are analyzed, using data
analytics methods together with ontologies will provide more accurate results. For
example, although the abbreviations used in the user comments can be different
according to society, this reveals the necessity to evaluate these abbreviations used
semantically. Language-based differences and restrictions can be supported through
ontologies, and thus it may be possible to evaluate user comments semantically. In
addition, this will ensure that the existing language support provided for ontologies
and the user comments in different languages are included in the analysis.
Companies can advertise their own products through social media. Searches
made via social media applications are evaluated to get an idea about a product,
and products produced by other companies similar to the searched product are shown
in the application as an advertisement. In this way, companies get the opportunity to
market their own products. Similarly, information such as pages, products, and links
that comes to us as suggestions is also produced using data analytics methods. Every
word we have shared is included in the data analytics process. Social media
applications extract attributes of their own users and find similar users using these
attributes. These applications offer us pages and products liked by users with similar
profiles to ours in the form of advertisements. In this profile extraction process,
ontologies can be used to store instances and to calculate similarities of the profiles
by interoperating with data analytics methods. Thanks to the instance storage
potential of ontologies, instances in ontologies can be used as sample data in analysis
processes.
Rapidly increasing data size and increasing commercial competition have led many
companies to use data analytics methods. Data analytics techniques can determine
the methods to be followed in order to increase the market percentage of the
companies, and can also be a guide for the decisions that companies should take in
their internal processes. Data analytics techniques can be used not only by
120 F. Şentürk
• Although the use of ontologies together with data analytics or data acquisition, or
during data integration, seems to have positive results, it may cause extra work-
load in some conditions. For example, in a data analytics architecture where only
numerical data is obtained and processed, adding ontologies to the preprocessing
step may cause extra controls and decrease the performance of the architecture.
Therefore, deciding whether to use ontologies is an important issue.
• In addition to deciding whether ontologies should be included in data analytics or
not, it is also important to determine at what stage of data analysis (data collec-
tion, data preprocessing, data analysis, or result) these should be included.
Incorrect determination of the step in which the ontology will be used may
prevent the system from working properly, or using ontology with data analytics
may not contribute to the results obtained.
• When companies or experts decide to use ontologies, calibrations should be made
on the data analytics techniques that they currently use. There will be a need for
expert personnel in the field who know the properties of ontologies well and can
use the advantages of the ontologies for making calibrations of these techniques.
To overcome this problem, either an expert in ontologies should be hired or
existing personnel should be trained, which takes time.
• The inclusion of synonyms and foreign language support in data analytics
architecture through ontologies can lead to longer analysis times.
• In addition, data security at every stage of data analytics is also an important
problem. If it fails to provide data security, the company could lose the trust of its
customers, or this may result in financial loss for the company.
The combined use of data analytics and ontologies will create new opportunities
in data management, storage, and processing (Konys, 2016). There will be more
architecture for the combined use of ontologies and data analytics techniques in the
near future. For example, it will be possible to obtain information about our health
status through our mobile phones and to direct first aid personnel automatically in
case of any emergency situation. With the applications that will enter our lives,
people’s behavior, social media shares, and phone messages will be analyzed and
people’s emotional status can be analyzed. Many products, which are standardized
today, will be transformed into a personalized form in the future through ontologies
that continue to be defined, such as gene ontologies or ontologies that will be newly
defined.
6.5 Conclusion
Nowadays, with the development of technology, data size has increased rapidly, and
data storing, data processing and extract information from data have become impor-
tant. Data analytics architectures are one of the methods used to develop a structure
that works in harmony with each of these processes. Data analytics architectures are
122 F. Şentürk
References
Berners-Lee, T., & Fischetti, M. (2001). Weaving the web: The original design and ultimate destiny
of the World Wide Web by its inventor. DIANE Publishing Company.
Fayyad, U. M., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge
discovery in databases. AI Magazine, 17(3), 37–54.
Kolli, R. (2008). Scalable matching of ontology graphs using partitioning. Doctoral dissertation,
University of Georgia.
Konys, A. (2016, October). Ontology-based approaches to big data analytics. In International
Multi-conference on Advanced Computer Systems (pp. 355–365). Cham: Springer.
Malik, S., & Jain, S. (2021, February). Semantic ontology-based approach to enhance text classi-
fication. In International Semantic Intelligence Conference, Delhi, India. 25–27 Feb 2021.
CEUR Workshop Proceedings (Vol. 2786, pp. 85–98). Retrieved from http://ceur-ws.org/Vol-
2786/Paper16.pdf
Mehla, S., & Jain, S. (2020). An ontology supported hybrid approach for recommendation in
emergency situations. Annals of Telecommunications, 75(7), 421–435.
Neuböck, T., Neumayr, B., Schrefl, M., & Schütz, C. (2014). Ontology-driven business intelligence
for comparative data analysis. In E. Zimányi (Eds.), Business intelligence. eBISS 2013. Lecture
Notes in Business Information Processing (Vol. 172). Cham: Springer. https://doi.org/10.1007/
978-3-319-05461-2_3
Rosse, C., & Mejino, J. L., Jr. (2003). A reference ontology for biomedical informatics: The
foundational model of anatomy. Journal of Biomedical Informatics, 36(6), 478–500.
Quamar, A., Özcan, F., Miller, D., Moore, R. J., Niehus, R., & Kreulen, J. (2020). Conversational
BI: An ontology-driven conversation system for business intelligence applications. Proceedings
of the VLDB Endowment, 13(12), 3369–3381.
Tsai, C. W., Lai, C. F., Chao, H. C., & Vasilakos, A. V. (2015). Big data analytics: A survey.
Journal of Big Data, 2(1), 1–32.
Fatmana Şentürk, received the Ph.D. degree in computer science from the Department of
Computer Engineering, Ege University, 2019 and the MS and BS degree in computer science
from the Department of Computer Engineering, Pamukkale University, 2012 and 2008. She is
currently a research assistant at Pamukkale University in Turkey. Her research interests include data
science, ontology matching, graph and graph algorithms.