A Flask-based web application that allows users to interact with the Paglia database on PostgreSQL using natural language queries based on AI agents to rewrite user questions, generate SQL queries, validate them, and display the results in a modern, chat-like interface.
sample.mp4
- Clone tepository:
git clone https://github.com/your-username/text-to-sql.git
cd text-to-sql- Create and Activate a Virtual Environment:
python -m venv venv
venv\Scripts\activate- Install Dependencies:
pip install -r requirements.txt- Set Up Environment Variables: Create a
.envfile in the project root and add your Google Gemini API key:
API_KEY=your_google_gemini_api_key- Set up and Configure the Database: Ensure PostgreSQL is running locally with the following credentials:
git clone https://github.com/devrimgunduz/pagila.git
docker-compose up
docker exec -it pagila psql -U postgres- Host: localhost
- Port: 5432
- Database: postgres
- User: postgres
- Password: 123456
Modify backend/database.py if your configuration differs.
- Run the Application:
cd app
python app.py- Access the Web Interface: Open a browser and navigate to http://127.0.0.1:5000
- Researched Existing Methods
- Metadata Generation by extracting database metadata from PostgreSQL, and also fed that along with image diagram to Gemini to create descriptive data about tables, columns and their relationships.
- Data Retrieval by creating a
Databaseclass that can connect and retrieve queries as Pandas Dataframes. - Experimented with different approaches, evaluated results and and settled on Agents.
- Built Custom AI Agents for different aspects of query generation
- Rewriter Agent: will check if the question makes sense and ask for clarifications, if it does then will generate a transformed detailed question that generator uses.
- Generator Agent: uses rewritten question to transform into queries, and can take risks by creating long complex queries.
- Validator Agent: checks for correctness of the generated query and optimizes.
- Created a Flask based simple Web App that can be used to interact with the databse.
- Evaluated all evalution questions and stored them in
.\evaluation_report.csv
- Rewriting: The RewriterAgent refines the user query for better SQL alignment and asks for clarification if needed.
- Generation: The GeneratorAgent converts the refined query into a SQL statement.
- Validation: The ValidatorAgent ensures the SQL query is correct and optimized.
- Execution: The Database class executes the validated SQL query and returns the results.
- Backend: Flask, PostgreSQL
- Frontend: HTML, CSS, JavaScript
- AI Integration: Google Gemini API
- Enter your natural language query in the chat box.
- If prompted, provide additional clarification.
- View the generated SQL query and the resulting data table.
- Right now, it only works on Palgia Dataset but it could easily be generalized to work on any dataset, as metadata generation workflow can be automated.
- An additional RAG system could be built to retrieve the most relavant tables using the metadata, instead of relying on giving huge amounts of metadata to Agents in on go, using historical and previous SQL query logs on the database.
- Shorter and better Metadata should be generated, through emperical testing.
- Table Augmented Generation (TAG) can be implemented on top of this after the database is retrieved to give the user insights about the retrieved tables.