0% found this document useful (0 votes)
20 views20 pages

Dbms Lab El Report

The project report details the development of an NLP-based system for automated SQL query generation, allowing users to input natural language queries that are converted into SQL commands using OpenAI's GPT-3.5-turbo model. The system enhances accessibility for non-technical users by integrating database schema analysis and providing a user-friendly web interface. Future enhancements include support for additional databases, advanced query optimization, and voice-based querying.

Uploaded by

Saksham Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views20 pages

Dbms Lab El Report

The project report details the development of an NLP-based system for automated SQL query generation, allowing users to input natural language queries that are converted into SQL commands using OpenAI's GPT-3.5-turbo model. The system enhances accessibility for non-technical users by integrating database schema analysis and providing a user-friendly web interface. Future enhancements include support for additional databases, advanced query optimization, and voice-based querying.

Uploaded by

Saksham Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

RV COLLEGE OF ENGINEERING®

BENGALURU – 5600591
(Autonomous Institution Affiliated to VTU, Belagavi)

In Partial Fulfilment
of the Requirements for the Experiential Learning component for

DATABASE MANAGEMENT SYSTEMS


(CD252IA)

LABORATORY PROJECT REPORT

Automated SQL Query Generation Using NLP


Submitted by
SAKSHAM SINGH (1RV22CD050)
ROHAN KURUP (1RV22CD049)
ROHAN GANESH(1RV22CD048)

Under the Guidance of

PROF PAVITHRA H
Department of CSE, RVCE, Bengaluru - 560059
RV COLLEGE OF ENGINEERING®
BENGALURU – 560059
(Autonomous Institution Affiliated to VTU, Belagavi)
CERTIFICATE

This is to certify that the project titled “Automated SQL Query Generation Using NLP has been
executed by ROHAN GANESH(1RV22CD048), SAKSHAM SINGH(1RV22CD050) and
ROHAN KURUP(1RV22CD049) students of R.V. College of Engineering, Bengaluru. It is
further certified that all recommendations and corrections proposed during the Internal
Assessment process have been incorporated into the final report. The report has been reviewed
and approved, meeting the requisite academic standards for experiential learning.

Marks Awarded:
ABSTRACT

This study focuses on developing a Natural Language Processing (NLP)-


based system to automate SQL query generation. The proposed system
allows users to input queries in natural language, which are then translated
into structured SQL commands and executed on a database. By leveraging
OpenAI’s GPT-3.5-turbo model, the system ensures high accuracy in
interpreting user queries and converting them into executable SQL
statements.
A crucial aspect of this approach is the extraction of database schema
information using SQLAlchemy. The schema is then represented as a
knowledge graph using NetworkX, which helps the model generate more
contextually accurate queries. The system features a web-based interface
that allows users to upload database files, input natural language queries,
and view query results in real-time.
Experimental results demonstrate that the proposed method improves the
accessibility of databases for non-technical users while maintaining the
accuracy of generated queries. The system can be extended to support
different types of databases, advanced query optimization, and integration
with AI-driven assistants for a more seamless user experience.
INTRODUCTION
Databases are an essential component of modern applications, enabling the
structured storage and retrieval of vast amounts of data. However,
interacting with these databases requires knowledge of Structured Query
Language (SQL), which can be a barrier for non-technical users. Writing
SQL queries requires understanding database schemas, relationships, and
query syntax, which can be challenging for those without prior experience.
As a result, many organizations rely on data analysts or database
administrators to execute complex queries, leading to bottlenecks in
decision-making and inefficiencies in workflow.

With the advancement of Natural Language Processing (NLP) and artificial


intelligence, it is now possible to bridge this gap by enabling users to query
databases using plain English. This project explores an automated system
that converts natural language queries into SQL statements using OpenAI’s
GPT-3.5-turbo model. By integrating database schema understanding
through SQLAlchemy and NetworkX, the system enhances query accuracy
and usability, allowing users to retrieve information effortlessly without
SQL knowledge.
PROBLEM STATEMENT
Interacting with databases typically requires expertise in SQL, which can be
a significant barrier for many users, including business professionals,
researchers, and analysts who may not have programming skills. Some key
challenges include:

1. Lack of SQL Knowledge – Many users struggle to write correct


SQL queries due to limited exposure to SQL syntax and concepts such
as joins, aggregations, and nested queries.

2. Complex Database Schemas – Large databases contain multiple


tables with intricate relationships, making it difficult for users to
determine how to structure their queries.

3. Time-Consuming Query Process – Non-technical users must rely on


database administrators or developers to retrieve information, causing
delays in decision-making.

4. Prone to Human Errors – Manually written queries can lead to


syntax errors or incorrect results, particularly for complex queries.

5. Scalability Issues – In organizations where multiple users require


database access, relying on a small group of SQL experts is inefficient
and does not scale well.

To address these issues, an NLP-based system is proposed to automatically


generate SQL queries from natural language inputs, reducing dependency
on SQL experts and improving efficiency in data retrieval.
OBJECTIVES
The primary objectives of this project are:
• To develop an NLP-based system that can interpret natural
language queries and generate SQL statements accurately.
• To integrate database schema analysis using SQLAlchemy to
extract table structures and relationships.
• To construct a schema knowledge graph using NetworkX to
improve the contextual understanding of table relations.
• To build a web-based user interface that enables users to upload
database files, enter queries, and view results without needing
SQL expertise.
• To evaluate the system’s accuracy, efficiency, and usability
through testing on various databases and query types.
• To explore potential improvements such as query optimization,
support for multiple database management systems (DBMS),
and enhanced AI models.
METHODOLOGY
The development of the NLP-based SQL query generation system
follows a structured approach consisting of the following steps:
1. Database Schema Extraction
• The user uploads an SQLite database file.
• SQLAlchemy is used to connect to the database and extract
metadata, including table names, column names, data types,
and foreign key relationships.
2. Building a Knowledge Graph
• NetworkX is used to create a directed graph where:
• Tables are represented as nodes.
• Columns are linked to their respective tables.
• Foreign key constraints are represented as edges between
tables.
• This graph provides contextual information for query
generation.
3. Natural Language Query Processing
• The user inputs a natural language query (e.g., “Find the total
sales for each product category”).
• OpenAI’s GPT-3.5-turbo model processes the query and,
using schema context, generates an appropriate SQL
statement.
4. SQL Query Execution
• The generated SQL query is executed against the uploaded
database using SQLAlchemy.
• The results are fetched and formatted for display.
5. User Interface
• A Flask-based web interface allows users to interact with
the system, upload databases, input queries, and view
results
IMPLEMENTATION
The database contains the following tables and relationships:

Entities and Attributes


1. Patients
• patient_id (Primary Key)
• first_name
• last_name
• date_of_birth
• gender
• contact_number
• address
2. Doctors
• doctor_id (Primary Key)
• first_name
• last_name
• specialty
• contact_number
• department_id
3. Appointments
• appointment_id (Primary Key)
• patient_id (Foreign Key → Patients)
• doctor_id (Foreign Key → Doctors)
• appointment_date
• appointment_time
• diagnosis
4. Departments
• department_id (Primary Key)
• name
5. Medications
• medication_id (Primary Key)
• name
• dosage
• patient_id (Foreign Key → Patients)
• doctor_id (Foreign Key → Doctors)
• prescription_date
6. Hospital Bills
• bill_id (Primary Key)
• patient_id (Foreign Key → Patients)
• total_amount
• bill_date

Relationships
• Appointments links Patients and Doctors (Many-to-Many)
• Medications links Patients and Doctors (Many-to-Many)
• Hospital Bills belongs to Patients (One-to-Many)
• Doctors have a department_id, but no foreign key is explicitly set in
the schema.
LITERATURE SURVEY
TITLE AUTHORS SUMMARY

1. "SQLformer: Deep Adrián Bazaga, Pietro Liò, This paper introduces


Auto-Regressive Gos Micklem SQLformer, a novel
Query Graph Transformer-based
Generation for architecture designed for
Text-to-SQL text-to-SQL translation.
Translation" (2023) The model predicts SQL
queries as abstract syntax
trees (ASTs) in an
autoregressive manner,
incorporating structural
inductive biases guided by
database table and column
selection. Experiments
demonstrate that
SQLformer achieves state-
of-the-art performance
across six prominent text-
to-SQL benchmarks.
2. "GenSQL—NLP- M. Sri Geetha, R. This study proposes an
Based SQL Yashwanthika, M. Sanjana NLP-based model to
Generation" (2022) Sri, M. Sudiksa convert natural
language utterances
into SQL queries. The
framework addresses
schema encoding by
utilizing a relation-
aware self-attention
mechanism within an
encoder-decoder
architecture. The
model's performance is
evaluated based on
specific coordination
and complexity norms
of inquiries,
demonstrating
improvements in text-
to-SQL tasks as
evidenced by results on
the Spider dataset.
3. "Deep Learning Ayush Kumar, Parth This survey provides a
Driven Natural Nagarkar, Prabhav Nalhe, comprehensive
Languages Text to Sanjeev Vijayakumar overview of 24 recent
SQL Query neural network models
Conversion: A developed for text-to-
Survey" (2022) SQL conversion. It
discusses various
architectures, including
convolutional neural
networks, recurrent
neural networks,
pointer networks,
reinforcement learning,
and generative models.
The paper also reviews
11 widely used datasets
for training text-to-SQL
models and explores
future application
possibilities for
seamless data querying.
Tingkai Zhang, Chaoyu SQLfuse is a system
Chen, Cong Liao, Jun that integrates open-
4. "SQLfuse: Wang, Xudong Zhao, source Large Language
Enhancing Text-to- Hang Yu, Jianchao Wang, Models with a suite of
SQL Performance Jianguo Li, Wenhui Shi tools to enhance text-
through to-SQL translation
Comprehensive accuracy and usability.
LLM Synergy" It features modules for
(2024) schema mining, schema
linking, SQL
generation, and a SQL
critic module to
continuously improve
query quality. The
system demonstrates
leading performance on
the Spider Leaderboard
and practical
deployment in business
contexts.
RESULTS

The system has been tested with multiple database schemas and query
types, demonstrating high accuracy in SQL query generation. Some key
findings include:
• Improved Query Accuracy – Using schema context from the
knowledge graph reduces errors in table joins and column references.
• Faster Query Execution – Users can retrieve information instantly
without writing manual SQL statements.
• User-Friendly Experience – The web interface allows intuitive
interaction, making database querying accessible to non-technical
users.
Workflow
1. User uploads a database file (SQLite format).
2. The system extracts schema information and builds a knowledge
graph.
3. User inputs a natural language query.
4. The NLP model processes the input and generates an SQL query.
5. The system executes the SQL query and retrieves results.
6. Results are displayed in the user interface.
FUTURE SCOPE

While the current implementation demonstrates significant


improvements in accessibility and efficiency, future enhancements
include:
• Support for Additional Database Systems – Extending
compatibility to MySQL, PostgreSQL, and other relational
databases.
• Advanced Query Optimization – Enhancing AI-generated
queries to improve performance on large datasets.
• Integration with Business Intelligence (BI) Tools – Allowing
users to visualize query results through dashboards and charts.
• Voice-Based Querying – Enabling users to input queries via
speech-to-text technology for a more natural interaction.
• Security Enhancements – Implementing strict input
validation and protection against SQL injection attacks.
CONCLUSION

This project successfully demonstrates that AI-powered natural language


processing can bridge the gap between users and databases by automating
SQL query generation. By integrating schema knowledge graphs and
OpenAI’s GPT-3.5-turbo, the system ensures accurate query generation and
execution. The user-friendly interface enables non-technical users to
interact with databases efficiently, reducing the need for SQL expertise.
Future improvements will further enhance the system’s capabilities, making
it an essential tool for data-driven decision-making across various domains.
By expanding support for multiple database systems, optimizing query
generation, and integrating with advanced BI tools, this system has the
potential to revolutionize how users interact with structured data.

You might also like