SQLyzr: A Comprehensive Benchmark and Evaluation Platform for Text-to-SQL

Abedini, Sepideh; Özsu, M. Tamer

Abstract:Text-to-SQL models have significantly improved with the adoption of Large Language Models (LLMs), leading to their increasing use in real-world applications. Although many benchmarks exist for evaluating the performance of text-to-SQL models, they often rely on a single aggregate score, lack evaluation under realistic settings, and provide limited insight into model behaviour across different query types. In this work, we present SQLyzr, a comprehensive benchmark and evaluation platform for text-to-SQL models. SQLyzr incorporates a diverse set of evaluation metrics that capture multiple aspects of generated queries, while enabling more realistic evaluation through workload alignment with real-world SQL usage patterns and database scaling. It further supports fine-grained query classification, error analysis, and workload augmentation, allowing users to better diagnose and improve text-to-SQL models. This demonstration showcases these capabilities through an interactive experience. Through SQLyzr's graphical interface, users can customize evaluation settings, analyze fine-grained reports, and explore additional features of the platform. We envision that SQLyzr facilitates the evaluation and iterative improvement of text-to-SQL models by addressing key limitations of existing benchmarks. The source code of SQLyzr is available at this https URL.

Subjects:	Databases (cs.DB); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2604.21214 [cs.DB]
	(or arXiv:2604.21214v1 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.2604.21214

Computer Science > Databases

Title:SQLyzr: A Comprehensive Benchmark and Evaluation Platform for Text-to-SQL

Submission history

Access Paper:

Current browse context:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators