0% found this document useful (0 votes)
12 views16 pages

Context-Based Ontology Modelling For Database: Enabling Chatgpt For Semantic Database Management

This research paper discusses the integration of ChatGPT into database management by developing a syntax that translates database semantics into natural language, enhancing ChatGPT's ability to perform tasks like semantic integration and table joining. The proposed Context-based Ontology Modelling for Database (COM-DB) demonstrates improved accuracy and efficiency in database operations while addressing privacy concerns. The findings suggest a promising direction for AI-based database management, potentially reducing the need for extensive domain knowledge and facilitating automatic operations.

Uploaded by

Salma Salem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views16 pages

Context-Based Ontology Modelling For Database: Enabling Chatgpt For Semantic Database Management

This research paper discusses the integration of ChatGPT into database management by developing a syntax that translates database semantics into natural language, enhancing ChatGPT's ability to perform tasks like semantic integration and table joining. The proposed Context-based Ontology Modelling for Database (COM-DB) demonstrates improved accuracy and efficiency in database operations while addressing privacy concerns. The findings suggest a promising direction for AI-based database management, potentially reducing the need for extensive domain knowledge and facilitating automatic operations.

Uploaded by

Salma Salem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

ChatGPT for Semantic Database Management 1

arXiv:2303.07351v1 [cs.DB] 11 Mar 2023

Context-based Ontology Modelling for


Database: Enabling ChatGPT for Semantic
Database Management
Wenjun Lin1,2 , Paul Babyn1 , Yan Yan2 and Wenjun Zhang1*
1* Collegeof Engineering, University of Saskatchewan, 57 Camous
Dr, Saskatoon, S7N 5A9, SK, Canada.
2 College of Science, Thompson Rivers University, 835 University

Dr, Kamloops, V2C 0C8, BC, Canada.

*Corresponding author(s). E-mail(s): [email protected];


Contributing authors: [email protected];
[email protected]; [email protected];

Abstract
This research paper explores the use of ChatGPT in database man-
agement. ChatGPT, an AI-powered chatbot, has limitations in per-
forming tasks related to database management due to the lack
of standardized vocabulary and grammar for representing database
semantics. To address this limitation, the paper proposes a solu-
tion that involves developing a set of syntaxes that can repre-
sent database semantics in natural language. The syntax is used
to convert database schemas into natural language formats, pro-
viding a new application of ChatGPT in database management.
The proposed solution is demonstrated through a case study where
ChatGPT is used to perform two tasks, semantic integration, and
tables joining. Results demonstrate that the use of semantic database
representations produces more precise outcomes and avoids com-
mon mistakes compared to cases with no semantic representation.
The proposed method has the potential to speed up the database
management process, reduce the level of understanding required for
database domain knowledge, and enable automatic database operations
without accessing the actual data, thus illuminating privacy protec-
tion concerns when using AI. This paper provides a promising new
direction for research in the field of AI-based database management.

1
2 ChatGPT for Semantic Database Management

Keywords: ChatGPT, Database management, Semantic database


representation, Semantic integration, Tables joining

1 Introduction
ChatGPT is a conversational chatbot that uses artificial intelligence (AI) and
machine learning (ML) techniques, combined with natural language process-
ing (NLP) methods, to produce human-like text. It was launched in November
2022 and quickly gained popularity, reaching over one million users within just
five days [1]. ChatGPT’s ability to produce human-like text and perform a wide
range of tasks has made it a popular tool for many users, including answer-
ing questions, writing short stories, composing music, solving math problems,
performing language translations, and even computer programming.
Database operation involves the manipulation of data and information
using specific syntax or commands, similar to computer programming. In
database operations, these commands are known as database queries, which
are used to retrieve, update, and manipulate data stored in a database. Sim-
ilarly, in computer programming, instructions are written in a programming
language, such as C or Python, in order to specify the desired behaviour of a
computer program.
There has been an expectation that ChatGPT could assist in creating
database queries, just as it can assist in creating computer programs. However,
creating database queries requires an understanding of the database itself, and
there is no conventional way to represent database semantics. This problem
limits ChatGPT’s ability to perform tasks related to database management.
In this paper, we present a solution to this problem by developing a set
of syntax that can represent database semantics, such as table structure and
relationships, in natural language. This allows for the creation of semantic rep-
resentations of databases that can be understood by ChatGPT and enable it
to perform database management tasks. Our work is demonstrated through a
case study, where ChatGPT is used to perform two tasks: semantic integra-
tion and table joining. Our results show that the use of semantic database
representations produces more precise outcomes and avoids common mistakes
compared to cases with no semantic representation.
The proposed method transforms database schemas into natural language
formats, providing a new application of ChatGPT in database management.
This study has the potential to speed up the database management process,
reduce the level of understanding required for database domain knowledge,
and enable automatic database operations without accessing the actual data,
thus illuminating privacy protection concerns when using AI.
The rest of the paper is organized as follows: In Section 2, we provide a
review of related work in the area of database management using AI. Then,
we describe our proposed solution in Section 3. Section 4 presents the results
ChatGPT for Semantic Database Management 3

and discussion of our case study. Finally, we discuss the potential benefits and
limitations of our work and conclude with future directions for research.

2 Literature review
2.1 AI-based database queries generation
The use of AI models for generating database queries through natural language
has been the focus of several research studies. One such model proposed by
Bais et al. [2] utilizes NLP techniques to analyze and interpret user queries
by performing morphological, syntactic, and semantic analysis, resulting in a
valid database query in SQL. Similarly, Sawant et al. [3] implemented a system
that can generate SQL queries from text and speech input using NLP and deep
learning techniques such as Long Short Term Memory (LSTM).
Other studies, such as Ghosh et al. [4], Nagare et al. [5], and Kombade et
al. [6] , have also utilized techniques such as lexical analysis, syntax analysis,
and semantic analysis to extract SQL queries from natural language input.
Kombade et al. [6] even considered the use of abbreviations in NLP to generate
SQL queries. The implementation of these studies used python with a GUI for
input and output, and the user could provide input through speech or text.
Despite the progress made in this field, limitations still exist in the ability
of AI models to accurately generate database queries from natural language
due to the complexity and ambiguity of natural language, as well as the lack
of standardized vocabulary and grammar for representing database structures.
For instance, Nagare et al. [5] mentions that the system checks the validity
of the user’s query, but it is unclear how the query’s validity is determined.
Moreover, the studies only consider basic database operations such as select,
delete, and update. Complex operations, such as joining multiple tables and
semantic integration, have not been investigated.

2.2 Semantic integration


Semantic integration is crucial for resolving mismatches in data representation
between related databases. In today’s digital world, organizations are generat-
ing and storing vast amounts of data in various databases, which often leads to
inconsistencies in the data representation. For instance, one database may use
the attribute ”Social Security Number” to identify individuals, while another
database may use the attribute ”SSN”. In such cases, it is crucial to determine
the relationships between these attributes in order to accurately compare and
combine the data from these databases.
Several approaches for semantic integration have been developed in recent
years. Tools like Silk [7], LIMES [8], and PARIS [9] use string similarity
metrics, functional properties, and manual configuration to detect matching
attributes. WebPie [10] and LINDA [11] are fully automatic systems that use
techniques such as neighborhood checks and block placement. MateTee [12]
and RDF2VEC [13] are more recent approaches that utilize embeddings and
4 ChatGPT for Semantic Database Management

machine learning to find similarities. However, the task of matching entities


can become complicated with the presence of entities in abbreviates [14].
The existing approaches are heavily relying on domain expertise and require
complex preparations. For example, dataset-based approaches [12, 13] require
determining the relationships among attributes. Keyword-based approaches
[7, 8] depend on the accuracy of metadata, while URI-based approaches [10]
strongly depend on dereferencing HTTP URIs.

3 Methodology
3.1 ChatGPT
ChatGPT is a language model developed by OpenAI [15]. It is a type of AI
algorithm trained to predict the likelihood of a given sequence of words based
on the context of the words that come before it. This technology is based on
self-attention mechanisms [16] and has been trained on a massive dataset of
text, allowing it to generate sophisticated and seemingly intelligent writing.
ChatGPT is designed to converse with users in English and other languages
on a wide range of topics, making it ideal for use in chatbots, customer service,
content creation, and language translation tasks.
One of the applications of ChatGPT is to assist in programming, which
can be achieved in two ways. Firstly, ChatGPT can serve as a programming
assistant or tool. For instance, developers can ask ChatGPT programming-
related questions and obtain recommendations and suggestions about general
workflows and steps. Secondly, ChatGPT can generate code snippets directly,
resulting in enhanced productivity and time-saving benefits for developers.
Despite its advanced natural language processing capability and successes
in assisting programming, ChatGPT has not yet been able to generate queries
for databases because database schemas, which contain vital information about
database structures, are frequently written in the form of a graph rather than
natural language.

3.2 Context-based Ontology Modelling for Database


This study presents a new method, Context-based Ontology Modelling for
Database (COM-DB), which is aimed at converting database schemas into
natural language. COM-DB is built upon our previous research on ontology
modelling [17], which utilizes constructs like context-of, monodirectional rela-
tionship, and bi-directional relationship to describe the relationship of concepts
in databases.
In this study, we focus on the usage of the “context-of” construct for
describing database schema, especially at the conceptual data model level. A
conceptual schema or conceptual data model is a map of concepts and their
relationships used for databases. The key feature of COM-DB is the ability to
convert these relationships into natural language, which makes it more acces-
sible to ChatGPT. To demonstrate the effectiveness of COM-DB, we provide
ChatGPT for Semantic Database Management 5

two examples that illustrate how the “context-of” construct can be used to
describe the relationship of headers within one table and the relationship of
tables within one database.

Patients_Alabama
Basic schema
Id
a table ‘Patients_Alabama’
BIRTHDATE with headers:
DEATHDATE Id, BIRTHDATE,
DEATHDATE, SSN, PREFIX,
SSN FIRST, LAST, SUFFIX,
MAIDEN, MARITAL, RACE,
PREFIX ETHNICITY, GENDER,
BIRTHPLACE, ADDRESS,
FIRST CITY, STATE, COUNTY.

LAST

SUFFIX

MAIDEN

MARITAL

RACE
Contextual schema
ETHNICITY
In table ‘Patients_Alabama’,
GENDER headers ADDRESS,
CITY, STATE, and COUNTY
BIRTHPLACE are in the context of patients’
address.
ADDRESS

CITY

STATE

COUNTY

Fig. 1 Describing relationships of headers within one table using COM-DB

The example in Figure 1 demonstrates how COM-DB can be used to


describe the relationships between headers within a database table. The left
side of the figure shows a typical database table schema, which includes the
name of the table and the names of its headers.
COM-DB converts this schema into two parts: the base schema, which
describes the names of the headers in the table, and the contextual schema,
which uses the “context-of” construct to describe relationships among head-
ers. In this example, the headers ”ADDRESS”, ”CITY”, ”STATE”, and
”COUNTY” are related and are used to store information about patients’
addresses.
The contextual schema condenses this information into a single sentence
by using the “context-of” construct to describe the relationship between these
headers and the patients’ addresses. Specifically, it states that ”ADDRESS,
CITY, STATE, COUNTY are in the context of patients’ address”. This rep-
resents four relationships in one sentence, making the information more easily
understandable and less verbose.
Figure 2 is an Entity-Relationship schema of a hospital database. Similar
to the first example, COM-DB represents the database in two parts. The first
part, basic schema, explain headers in each table. Some basic schemas are as
follows:
6 ChatGPT for Semantic Database Management

patients providers careplans devices

Id Id Id START

BIRTHDATE ORGANIZATION START STOP

DEATHDATE NAME STOP PATIENT

SSN GENDER PATIENT ENCOUNTER

PREFIX SPECIALITY ENCOUNTER CODE

FIRST ADDRESS CODE DESCRIPTION

LAST CITY DESCRIPTION

SUFFIX STATE

MAIDEN ZIP procedures immunizations


MARITAL DATE DATE
RACE PATIENT PATIENT
encounters
ETHNICITY ENCOUNTER ENCOUNTER
Id
GENDER CODE CODE
START
BIRTHPLACE DESCRIPTION DESCRIPTION
STOP
ADDRESS BASE_COST BASE_COST
PATIENT
CITY
ORGANIZATION
STATE
PROVIDER
COUNTY imaging_studies payers
PAYER
Id Id
ENCOUNTERCLASS
allergies DATE NAME
CODE
Id PATIENT ADDRESS
DESCRIPTION
START ENCOUNTER CITY
BASE_ENCOUNTER_COST
STOP BODYSITE_CODE STATE_HEADQUARTERED
TOTAL_CLAIM_COST
PATIENT BODYSITE_DESCRIPTION ZIP
PAYER_COVERAGE
ENCOUNTER MODALITY_CODE PHONE

CODE observations MODALITY_DESCRIPTION AMOUNT_COVERED

DESCRIPTION DATE SOP_CODE AMOUNT_UNCOVERED

PATIENT SOP_DESCRIPTION REVENUE

ENCOUNTER COVERED_ENCOUNTERS
organizations UNCOVERED_ENCOUNTERS
CODE
Id medications COVERED_MEDICATIONS
DESCRIPTION
NAME START UNCOVERED_MEDICATIONS
VALUE
ADDRESS STOP COVERED_PROCEDURES
UNITS
CITY PATIENT UNCOVERED_PROCEDURES
TYPE
STATE PAYER COVERED_IMMUNIZATIONS
ZIP conditions ENCOUNTER UNCOVERED_IMMUNIZATIONS
LAT START CODE UNIQUE_CUSTOMERS
LON STOP DESCRIPTION QOLS_AVG
PHONE PATIENT BASE_COST MEMBER_MONTHS
REVENUE ENCOUNTER PAYER_COVERAGE
UTILIZATION CODE DISPENSES
DESCRIPTION TOTALCOST

Fig. 2 Entity-Relationship schema of a hospital database [18]

Given a table ’allergies’ with headers: Id, START, STOP, PATIENT,


ENCOUNTER, CODE, DESCRIPTION.
And a table ’careplans’ with headers: Id, START, STOP, PATIENT,
ENCOUNTER, CODE, DESCRIPTION.
And a table ’conditions’ with headers: START, STOP, PATIENT,
ENCOUNTER, CODE, DESCRIPTION.
And a table ’devices’ with headers: START, STOP, PATIENT,
ENCOUNTER, CODE, DESCRIPTION, UDI.
ChatGPT for Semantic Database Management 7

And a table ’encounters’ with headers: Id, START, STOP, PATIENT,


ORGANIZATION, PROVIDER, PAYER, ENCOUNTERCLASS, CODE,
DESCRIPTION, BASE_ENCOUNTER_COST, TOTAL_CLAIM_COST,
PAYER_COVERAGE.
And a table ’imaging_studies’ with headers: Id, DATE, PATIENT,
ENCOUNTER, BODYSITE_CODE, BODYSITE_DESCRIPTION, MODALITY_CODE,
MODALITY_DESCRIPTION, SOP_CODE, SOP_DESCRIPTION.
And a table ’immunizations’ with headers: DATE, PATIENT, ENCOUNTER
, CODE, DESCRIPTION, BASE_COST.
And a table ’medications’ with headers: START, STOP, PATIENT,
PAYER, ENCOUNTER, CODE, DESCRIPTION, BASE_COST, PAYER_COVERAGE
, DISPENSES, TOTALCOST.
And a table ’observations’ with headers: DATE, PATIENT, ENCOUNTER,
CODE, DESCRIPTION, VALUE, UNITS, TYPE.
And a table ’organizations’ with headers: Id, NAME, ADDRESS, CITY,
STATE, ZIP, LAT, LON, PHONE, REVENUE, UTILIZATION.
And a table ’patients’ with headers: Id, BIRTHDATE, DEATHDATE, SSN
, PREFIX, FIRST, LAST, SUFFIX, MAIDEN, MARITAL, RACE,
ETHNICITY, GENDER, BIRTHPLACE, ADDRESS, CITY, STATE, COUNTY.
And a table ’payers’ with headers: Id, NAME, ADDRESS, CITY,
STATE_HEADQUARTERED, ZIP, PHONE, AMOUNT_COVERED,
AMOUNT_UNCOVERED, REVENUE, COVERED_ENCOUNTERS,
UNCOVERED_ENCOUNTERS, COVERED_MEDICATIONS,
UNCOVERED_MEDICATIONS, COVERED_PROCEDURES,
UNCOVERED_PROCEDURES, COVERED_IMMUNIZATIONS,
UNCOVERED_IMMUNIZATIONS, UNIQUE_CUSTOMERS, QOLS_AVG,
MEMBER_MONTHS.
And a table ’procedures’ with headers: DATE, PATIENT, ENCOUNTER,
CODE, DESCRIPTION, BASE_COST.
And a table ’providers’ with headers: Id, ORGANIZATION, NAME,
GENDER, SPECIALITY, ADDRESS, CITY, STATE, ZIP.
In addition, the second part, contextual schema, is as follows:
allergies, careplans, conditions, devices, immunizations,
observations, procedures, imaging_studies are in the context
of patients, encounters.
encounters are in the context of patients, organizations,
providers, payers.
medications are in the context of patients, encounters, payers.
providers are in the context of organizations.
Note that the contextual schema is shown in a condensed form. For
example, “allergies, careplans, conditions, devices, immunizations, observa-
tions, procedures, imaging studies are in the context of patients, encounters.”
represents 8 × 2 = 16 relationships. These relationships are between 8
tables “allergies, careplans, conditions, devices, immunizations, observations,
procedures, imaging studies” and 2 tables “patients, encounters”.
We use these examples to show practical applications of COM-DB and
how it can be used to generate natural language descriptions of complex
database schemas. Overall, the methodology of this study involves designing
8 ChatGPT for Semantic Database Management

and implementing the COM-DB method, which includes utilizing the “context-
of” construct and other ontology modelling constructs to convert database
schema into natural language. The effectiveness of this method is demonstrated
through the use of two examples, which show how it can be used to complete
two sophisticated database management tasks. The effectiveness of the method
is demonstrated in the case study.

4 Case study
The case study aims to showcase the efficacy of the proposed COM-DB
system. The system’s primary feature is the “context-of” construct, which
utilizes natural language to capture database semantics like table structure
and relationships. The primary objective of the system is to create semantic
representations of databases that can be easily comprehended by ChatGPT,
enabling it to perform various database management tasks.
The case study provides empirical evidence to support the effective-
ness of COM-DB. Two sample databases are collected from the literature,
Synthea Alabama [18] and BDA EHR [19]. Based on those databases, two
experiments are conducted that represent typical tasks conducted during
database integration: semantic integration and tables joining. In both exper-
iments, ChatGPT is used to perform tasks with and without the COM-DB-
based schema. The study repeats each experiment 10 times to ensure reliability
and eliminate the potential inconsistency in ChatGPT’s performance. Results
demonstrated illustrate an average result from the repeated experiments.

4.1 Experiment 1: Semantic Integration


Semantic integration involves merging two tables with the same category
of information. Different headers’ names among different tables often cause
incompatibility issues. ’patients A’ and ’patients B’ are tables of patient
information from BDA EHR and Synthea Alabama, respectively. ’patients A’
contains headers: Id patients, Name, Surname, Date of Birth, Place of
Birth, Address, Gender, Blood Type, Job. And ’patients B’ contains head-
ers: Id, BIRTHDATE, DEATHDATE, SSN, PREFIX, FIRST, LAST, SUF-
FIX, MAIDEN, MARITAL, RACE, ETHNICITY, GENDER, BIRTHPLACE,
ADDRESS, CITY, STATE, COUNTY.
The goal of this experiment is to identify headers from table ’patients A’
and table ’patients B’ which contain the same information, knowing that some
headers may need to be combined or split for the mapping. The ideal mapping
is illustrated in Table 1.
Figure 3 shows the input and output of using ChatGPT without COM-DB
based schema. The message from the icon ’FF’ is the input, while the message
from the graphical icon is the output from ChatGPT. The input contains
two parts of information. The first part is to explain the situation, which
contains the names of headers in each table. Note that only the headers are
provided here, without any sample data or data type. The second part “Identify
ChatGPT for Semantic Database Management 9

Table 1 The ideal header mapping results of


table ’patients A’ and table ’patients B’

patients A patients B

Name FIRST
Surname LAST
Date of Birth BIRTHDATE
Place of Birth BIRTHPLACE
Address ADDRESS CITY STATE COUNTY
Gender GENDER

the headers from table ’patients A’ and table ’patients B’ which contain the
same information. Some headers may need to be combined or split.” is an
explanation of the task to be completed by ChatGPT.

Fig. 3 Experiment 1, header mapping without COM-DB based schema

The output in Figure 3 shows that ChatGPT can understand the task and
perform it to a degree. It matches Date of Birth and BIRTHDATE, Place of
Birth and BIRTHPLACE, Gender and GENDER, correctly. However, it failed
to match Name with FIRST, and Surname with LAST. In addition, ADDRESS
in ’patients B’ should be used with other headers CITY, STATE, COUNTY.
This was not noticed by ChatGPT.
Figure 4 shows the input and output of using ChatGPT with COM-DB
based schema. In addition to the inputs used in Figure 3, the ontology model
information is described as “In table ’patients A’, headers Name and Surname
are in the context of patients’ name. In table ’patients B’, headers ADDRESS,
CITY, STATE, and COUNTY are in the context of patients’ address.”
10 ChatGPT for Semantic Database Management

Fig. 4 Experiment 1, header mapping with COM-DB based schema

Figure 4 illustrates a typical result from ChaGPT and demonstrates the


COM-DB based schema improves the performance of semantic integration as
ChatGPT has successfully identified all mappings as expected.

4.2 Experiment 2: Tables Joining


Tables joining involves generating a new table or view that combines data
from multiple tables. This process requires an Entity Relationship schema
which doesn’t have a conventional way to be represented in natural language.
To demonstrate the effectiveness of COM-DB based schema in representing
the Entity Relationship, Synthea Alabama is used in this experiment. The
database contains 14 tables as shown in Figure 2. The goal of the experiment
is to create a SQL query that generates a list of careplans, with corresponding
providers’ and patients’ identity information. The careplans are advices from
providers (such as physicians) to patients. A SQL query needs to properly
join four tables: careplans, providers, patients, and encounters. The encounters
table plays a critical role here as it connects the patients table with the care-
plans table. This information is typically contained in an Entity Relationship
schema.
Figure 5 shows the input and output of using ChatGPT without COM-DB
based schema. The input contains two parts of information. The first part is to
ChatGPT for Semantic Database Management 11

explain all tables with their contained headers in alphabetical order. Similar
to Experiment 1, only the headers are provided here without any sample data
or data type. The second part, “To create a SQL query that generates a list of
careplans, with corresponding providers’ and patients’ identity information.”
is an explanation of the task to be completed by ChatGPT.

Fig. 5 Experiment 2 Generate a new view from multiple tables without COM-DB based
schema. The conversation is split into two columns, from left to right.

The output from ChatGPT is verified by executing the SQL query in


the hospital database. The result is shown in Figure 6. From the result, it
was found the query doesn’t work due to the error “no such column: care-
plans.PROVIDER”. The root cause of this error is the missing the encounters
table as explained earlier.
Figure 7 shows the input and output of using ChatGPT with COM-DB
based schema that explains the context information of each table. In this
case, the COM-DB based schema describes relations between tables. Figure 8
verifies the SQL query by executing it in the hospital database. It shows that
ChatGPT has successfully generated the query that results in a correct view.
12 ChatGPT for Semantic Database Management

Fig. 6 Experiment 2, SQL query results without COM-DB based schema

Fig. 7 Experiment 2 Generate a new view from multiple tables with COM-DB based
schema. The conversation is split into two columns, from left to right.

5 Discussions
The results of the experiments indicate that ChatGPT performs better in
both semantic integration and tables joining tasks when using the COM-
DB-based schema. The context information provided by the ontology models
helps ChatGPT to better complete the tasks. The study demonstrates that
ChatGPT for Semantic Database Management 13

Fig. 8 Experiment 2, SQL query results with COM-DB based schema

the “context-of” construct in COM-DB captures certain context information


which may not be included in a conventional ontology model. This context
information determines the relations between concepts, which helps to elimi-
nate ambiguities between concepts and increases the chances of success during
database integration.
In addition, COM-DB enables automatic database operations without com-
promising privacy protections. Unlike existing AI-enabled database operations,
which require the AI algorithm to access all data in the database, COM-DB
only generates a schema from the table structure, instead of its content. This
schema does not contain any privacy information, making it safe to share with
third-party services such as ChatGPT. By sending the schema to ChatGPT,
the risk of privacy breach is significantly reduced, as ChatGPT can perform
automated database operations without ever accessing sensitive data. This
allows businesses and organizations to leverage the power of AI and automa-
tion to streamline their operations and improve efficiency, without sacrificing
the privacy and security of their customers’ data. With COM-DB, businesses
can have peace of mind knowing that their data is secure and protected, while
still enjoying the benefits of automated database operations.

6 Conclusion
This paper explores the use of ChatGPT in the area of database management,
highlighting the challenges of using natural language processing to perform
database queries. Our research presents a solution by developing a set of syn-
taxes to represent database semantics in natural language. These syntaxes,
called COM-DB, enable ChatGPT to perform tasks related to database man-
agement, such as semantic integration and tables joining. Our case study shows
that the use of semantic representations in database management leads to more
precise outcomes and reduces common mistakes compared to cases without
such representations.
Our research aims to contribute to the field of database management
by introducing a novel approach for converting database schemas into nat-
ural language format, thereby opening up new applications for ChatGPT.
14 ChatGPT for Semantic Database Management

This approach has the potential to deliver significant benefits, including


faster database management, reduced domain knowledge requirements, and
enhanced privacy protection through automated database operations that do
not require access to actual data.
Future work involves expanding the scope of our method to include more
complex database operations and testing it on larger databases. Furthermore,
we intend to investigate the feasibility of incorporating other natural language
processing models into database management and explore the possibilities of
combining various models to enhance their capabilities.
In conclusion, our research demonstrates the potential of natural lan-
guage processing models to be employed in the field of database management,
providing a new way to interact with and manipulate databases. By lever-
aging the power of ChatGPT alongside our COM-DB syntaxes, we have
demonstrated that complex database operations can be executed using natu-
ral language, offering a new approach to simplify database management and
enhance productivity.
Acknowledgments. This study is supported by Natural Sciences and Engi-
neering Research Council of Canada, Alliance grant #ALLRP 555161 -
20.

References
[1] Thorp, H.H.: ChatGPT is fun, but not an author. American Association
for the Advancement of Science (2023)

[2] Bais, H., Machkour, M., Koutti, L.: A model of a generic natural language
interface for querying database. international journal of intelligent systems
and applications 8(2), 35 (2016)

[3] Sawant, A., Raina, R., Patil, A., Pardeshi, A.: Ai model to generate sql
queries from natural language instructions through voice. In: Journal of
Physics: Conference Series, vol. 2273, p. 012014 (2022). IOP Publishing

[4] Ghosh, P.K., Dey, S., Sengupta, S.: Automatic sql query formation from
natural language query. International Journal of Computer Applications
975, 8887 (2014)

[5] Nagare, P., Indhe, S., Sabale, D., Thorat, G., Chaturvedi, P.: Automatic
sql query formation from natural language query. Int. Res. J. Eng. Technol
4, 1589–1591 (2017)

[6] Kombade, C., More, M., Pujari, A., Patil, S.: Natural language processing
with some abbreviation to sql. International journal for research in applied
science and engineering technology 8(5), 1046–1048 (2020)
ChatGPT for Semantic Database Management 15

[7] Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Silk-a link discovery
framework for the web of data. Ldow 538, 53 (2009)

[8] Ngomo, A.-C.N., Auer, S.: Limes-a time-efficient approach for large-scale
link discovery on the web of data. integration 15(3) (2011)

[9] Suchanek, F.M., Abiteboul, S., Senellart, P.: Paris: Probabilistic align-
ment of relations, instances, and schema. arXiv preprint arXiv:1111.7164
(2011)

[10] Urbani, J., Kotoulas, S., Maassen, J., Van Harmelen, F., Bal, H.: Webpie:
A web-scale parallel inference engine using mapreduce. Journal of Web
Semantics 10, 59–75 (2012)

[11] Böhm, C., De Melo, G., Naumann, F., Weikum, G.: Linda: distributed
web-of-data-scale entity matching. In: Proceedings of the 21st ACM Inter-
national Conference on Information and Knowledge Management, pp.
2104–2108 (2012)

[12] Morales, C., Collarana, D., Vidal, M.-E., Auer, S.: Matetee: A semantic
similarity metric based on translation embeddings for knowledge graphs.
In: Web Engineering: 17th International Conference, ICWE 2017, Rome,
Italy, June 5-8, 2017, Proceedings 17, pp. 246–263 (2017). Springer

[13] Ristoski, P., Paulheim, H.: Rdf2vec: Rdf graph embeddings for data min-
ing. In: The Semantic Web–ISWC 2016: 15th International Semantic Web
Conference, Kobe, Japan, October 17–21, 2016, Proceedings, Part I 15,
pp. 498–514 (2016). Springer

[14] Lantzaki, C., Papadakos, P., Analyti, A., Tzitzikas, Y.: Radius-aware
approximate blank node matching using signatures. Knowledge and
Information Systems 50, 505–542 (2017)

[15] van Dis, E.A., Bollen, J., Zuidema, W., van Rooij, R., Bockting, C.L.:
Chatgpt: five priorities for research. Nature 614(7947), 224–226 (2023)

[16] Humphreys, G.W., Sui, J.: Attentional control and the self: the self-
attention network (san). Cognitive neuroscience 7(1-4), 5–17 (2016)

[17] Lin, W., Babyn, P., Yan, Y., Zhang, W.: Ontology in the modern computer
era. Enterprise Information Systems (2023, submitted)

[18] Walonoski, J., Kramer, M., Nichols, J., Quina, A., Moesel, C., Hall,
D., Duffett, C., Dube, K., Gallagher, T., McLachlan, S.: Synthea: An
approach, method, and software mechanism for generating synthetic
patients and the synthetic electronic health care record. Journal of the
American Medical Informatics Association 25(3), 230–238 (2018)
16 ChatGPT for Semantic Database Management

[19] Silvestri, S., Esposito, A., Gargiulo, F., Sicuranza, M., Ciampi, M.,
De Pietro, G.: A big data architecture for the extraction and analysis of
ehr data. In: 2019 IEEE World Congress on Services (SERVICES), vol.
2642, pp. 283–288 (2019). IEEE

You might also like