Recent Advances in Text-To-SQL- A Survey of What We Have and What We Expect
Recent Advances in Text-To-SQL- A Survey of What We Have and What We Expect
Table 1: The statistic for recent text-to-SQL datasets. #Size, #DB, #D, and #T/DB represent the numbers of
question-SQL pairs, databases, domains, and the averaged number of tables per domain, respectively. The “-” in
the #D column indicates an unknown number of domains, and the “-” in the Issues Addressed indicates no specific
issue addressed by the dataset. Datasets above and below the line are cross-domain and single-domain, respectively.
The complete statistic is listed in Table 7 in Appendix C.
to evaluation (§ 4) 2 and highlight potential direc- unseen SQL queries or SQL queries from other
tions for future work (§ 5). Apendix A shows the domains (Finegan-Dollak et al., 2018; Yu et al.,
topology for the text-to-SQL task. 2018c). However, since these datasets are adapted
from real-life applications, most of them contain
2 Datasets domain knowledge (Gan et al., 2021b) and dataset
As shown in Table 1, existing text-to-SQL datasets conventions (Suhr et al., 2020). Thus, they are still
can be classified into three categories: single- valuable to evaluate models’ ability to generalize
domain datasets, cross-domain datasets and others. to new domains and explore how to incorporate do-
main knowledge and dataset convention to model
Single-Domain Datasets Single-domain text-to- predictions.
SQL datasets typically collect question-SQL pairs Appendix B gives a detailed discussion on do-
for a single database in some real-world tasks, in- main knowledge and dataset convention, and con-
cluding early ones such as Academic (Li and Ja- crete text-to-SQL examples.
gadish, 2014), Advising (Finegan-Dollak et al.,
2018), ATIS (Price, 1990; Dahl et al., 1994), Large Scale Cross-domain Datasets Large
GeoQuery (Zelle and Mooney, 1996), Yelp and cross-domain datasets such as WikiSQL (Zhong
IMDB (Yaghmazadeh et al., 2017), Scholar (Iyer et al., 2017) and Spider (Yu et al., 2018c) are pro-
et al., 2017) and Restaurants (Tang and Mooney, posed to better evaluate deep neural models. Wik-
2000; Popescu et al., 2003), as well as recent ones iSQL uses tables extracted from Wikipedia and lets
such as SEDE (Hazoom et al., 2021), ESQL (Chen annotators paraphrase questions generated for the
et al., 2021a) and MIMICSQL (Wang et al., 2020d). tables. Compared to other datasets, WikiSQL is an
These single-domain datasets, particularly the order of magnitude larger, containing 80,654 nat-
early ones, are usually limited in size, containing ural utterances in total (Zhong et al., 2017). How-
only a few hundred to a few thousand examples. Be- ever, WikiSQL contains only simple SQL queries,
cause of the limited size and similar SQL patterns and only a single table is queried within each SQL
in training and testing phases, text-to-SQL mod- query (Yu et al., 2018c).
els that are trained on these single-domain datasets Yu et al. (2018c) propose Spider, which contains
can achieve decent performance by simply mem- 200 databases with an average of 5 tables for each
orizing the SQL patterns and fail to generalize to database, to test models’ performance on compli-
2
Note that most work discussed in this paper is in English cated unseen SQL queries and their ability to gen-
unless otherwise specified. eralize to new domains. Furthermore, researchers
expand Spider to study various issues of their inter- and attain robustness towards different types of
est (Lei et al., 2020; Zeng et al., 2020; Gan et al., questions (Radhakrishnan et al., 2020) .
2021b; Taniguchi et al., 2021; Gan et al., 2021a).
Besides, researchers build several large-scale Typical data augmentation techniques involve
text-to-SQL datasets in different languages such as paraphrasing questions and filling pre-defined tem-
CSpider (Min et al., 2019a), TableQA (Sun et al., plates for increasing data diversity. Iyer et al. (2017)
2020), DuSQL (Wang et al., 2020c) in Chinese, use the Paraphrase Database (PPDB) (Ganitkevitch
ViText2SQL (Tuan Nguyen et al., 2020) in Viet- et al., 2013) to generate paraphrases for training
namese, and PortugueseSpider (José and Cozman, questions. Appendix B gives an example of this
2021) in Portuguese. Given that human transla- augmentation method. Iyer et al. (2017) and Yu
tion has shown to be more accurate than machine et al. (2018b) collect question-SQL templates and
translation (Min et al., 2019a), these datasets are an- fill in them with DB schema. Researchers also em-
notated mainly by human experts based on the En- ploy neural models to generate natural utterances
glish Spider dataset. These Spider-based datasets for sampled SQL queries to acquire more data. For
can serve as potential resources for multi-lingual instance, Li et al. (2020a) fine-tune pre-trained T5
text-to-SQL research. model (Raffel et al., 2019) using SQL query as the
Other Datasets Several context-dependent text- input to predict natural utterance on WikiSQL, and
to-SQL datasets have been proposed, which involve then randomly synthesize SQL queries from tables
user interactions with the text-to-SQL system in in WikiSQL and use the tuned model to generate
English (Price, 1990; Dahl et al., 1994; Yu et al., the corresponding natural utterance.
2019a,b) and Chinese (Guo et al., 2021). In addi-
tion, researchers collect datasets to study questions The quality of the augmented data is impor-
in text-to-SQL being answerable or not (Zhang tant because low-quality data can hurt the perfor-
et al., 2020), lexicon-level mapping (Shi et al., mance of the models (Wu et al., 2021). Various
2020b) and cross-domain evaluation for real Web approaches have been exploited to improve the
databases (Lee et al., 2021). quality of the augmented data. After sampling
Appendix C.1 discusses more details about SQL queries, Zhong et al. (2020b) employ an utter-
datasets mentioned in § 2. ance generator to generate natural utterances and
a semantic parser to convert the generated natural
3 Methods utterance to SQL queries. To filter out low-quality
Early text-to-SQL systems employ rule-based and augmented data, Zhong et al. (2020b) only keep
template-based methods (Li and Jagadish, 2014; data that have the same generated SQL queries as
Mahmud et al., 2015), which is suitable for simple the sampled ones. Wu et al. (2021) use a hierarchi-
user queries and databases. However, with the cal SQL-to-question generation process to obtain
progress in both DB and NLP communities, recent high-quality data. Observing that there is a strong
work focuses on more complex settings (Yu et al., segment-level mapping between SQL queries and
2018c). In these settings, deep models can be more natural utterances, Wu et al. (2021) decompose
useful because of their great feature representation SQL queries into several clauses, translate each
ability and generalization ability. clause into a sub-question, and then combine the
In this survey, we focus on the deep learn- sub-questions into a complete question.
ing methods primarily. We divide these meth-
ods employed in text-to-SQL research into Data To increase the diversity of the augmented
Augmentation (§ 3.1), Encoding (§ 3.2), Decod- data, Guo et al. (2018) incorporate a latent variable
ing (§ 3.3), Learning Techniques (§ 3.4), and Mis- in their SQL-to-text model to encourage question
cellaneous (§ 3.5). diversity. Radhakrishnan et al. (2020) augment the
WikiSQL dataset by simplifying and compressing
3.1 Data Augmentation questions to simulate the colloquial query behavior
Data augmentation can help text-to-SQL models of end-users. Wang et al. (2021b) exploit a proba-
handle complex or unseen questions (Zhong et al., bilistic context-free grammar (PCFG) to explicitly
2020b; Wang et al., 2021b), achieve state-of-the- model the composition of SQL queries, encourag-
art with less supervised data (Guo et al., 2018), ing sampling compositional SQL queries.
Methods Adopted by
Applied as “both columns are from the same table” in their
datasets graph.
TypeSQL (Yu et al., Graphs have also been used to encode questions
Encode type WikiSQL
2018a)
GNN (Bogin et al., together with DB schema. Researchers have been
Graph-based Spider
2019a) using different types of graphs to capture the se-
RAT-SQL (Wang
Self-attention
et al., 2020a)
Spider mantics in NL and facilitate linking between NL
SQLova (Hwang and table schema. Cao et al. (2021) adopt line
Adapt PLM WikiSQL
et al., 2019) graph (Gross et al., 2018) to capture multi-hop
TaBERT (Yin et al.,
Pre-training
2020)
Spider semantics by meta-path (e.g., an exact match for
a question token and column, together with the
Table 2: Typical methods used for encoding in text-to- column belonging to a table can form a 2-hop
SQL. The full table of existing methods and more de- meta-path) and distinguish between local and non-
tails are listed in Table 8 in Appendix D. local neighbors so that different tables and columns
will be attended differently. SADGA (Cai et al.,
2021) adopts the graph structure to provide a uni-
3.2 Encoding
fied encoding for both natural utterances and DB
Various methods have been adopted to address the schemas to help question-schema linking. Apart
challenges of representing the meaning of ques- from the relations between entities in both ques-
tions, representing the structure for DB schema, tions and DB schema, the structure for DB schemas,
and linking the DB content to question. We group S2 SQL (Hui et al., 2022) integrates syntax de-
them into five categories, as shown in Table 2. pendency among question tokens into the graph
Encode Token Types To better encode key- to improve model performance. To improve the
words such as entities and numbers in questions, Yu generalization of the graph method for unseen do-
et al. (2018a) assign a type to each word in the mains, ShawdowGNN (Chen et al., 2021b) ignores
question, with a word being an entity from the names of tables or columns in the database and
knowledge graph, a column, or a number. Yu et al. uses abstract schemas in the graph projection neu-
(2018c) concatenate word embeddings and the ral network to obtain delexicalized representations
corresponding type embeddings to feed into their of questions and DB schemas.
model. Finally, graph-based techniques are also ex-
ploited in context-dependent text-to-SQL. For in-
Graph-based Methods Since DB schemas con- stance, IGSQL (Cai and Wan, 2020) uses a graph
tain rich structural information, graph-based meth- encoder to utilize historical information of DB
ods are used to better encode such structures. schemas in the previous turns.
As summarized in § 2, datasets prior to Spider
typically involve simple DBs that contain only one Self-attention Models using transformer-based
table or a single DB in both training and testing. encoder (He et al., 2019; Hwang et al., 2019; Xie
As a result, modeling DB schema receives little et al., 2022) incorporate the original self-attention
attention. Because Spider contains complex and mechanism by default because it is the building
different DB in training and testing, Bogin et al. block of the transformer structure.
(2019a) propose to use graphs to represent the struc- RAT-SQL (Wang et al., 2020a) applies relation-
ture of the DB schemas. Specifically, Bogin et al. aware self-attention, a modified version of self-
(2019a) use nodes to represent tables and columns, attention (Vaswani et al., 2017), to leverage rela-
edges to represent relationships between tables and tions of tables and columns. DuoRAT (Scholak
columns, such as tables containing columns, pri- et al., 2021a) also adopts such a relation-aware
mary key, and foreign key constraints, and then self-attention in their encoder.
use graph neural networks (GNNs) (Li et al., 2016)
to encode the graph structure. In their subsequent Adapt PLM Various methods have been pro-
work, Bogin et al. (2019b) use a graph convolu- posed to leverage the knowledge in pre-trained lan-
tional network (GCN) to capture DB structures and guage models (PLMs) and better align PLM with
a gated GCN to select the relevant DB information the text-to-SQL task. PLMs such as BERT (Devlin
for SQL generation. RAT-SQL (Wang et al., 2020a) et al., 2019) are used to encode questions and DB
encodes more relationships for DB schemas such schemas. The modus operandi is to input the con-
catenation of question words and schema words Methods Adopted by
Applied
to the BERT encoder (Hwang et al., 2019; Choi datasets
et al., 2021). Other methods adjust the embeddings SyntaxSQLNet (Yu
Tree Spider
et al., 2018b)
by PLMs. On WikiSQL, for instance, X-SQL (He SQLNet (Xu et al.,
Sketch WikiSQL
et al., 2019) replaces segment embeddings from 2017)
the pre-trained encoder by column type embed- SmBop (Rubin and
Bottom-up Spider
Berant, 2021)
dings. Guo and Gao (2019) encode two additional Attention Wang et al. (2019) WikiSQL
feature vectors for matching between question to- Copy Wang et al. (2018a) WikiSQL
kens and table cells as well as column names and IRNet (Guo et al.,
IR Spider
2019)
concatenate them with BERT embeddings of ques- Global-GCN Bogin
Spider
tions and DB schemas. Others et al. (2019b)
Kelkar et al. (2020) Spider
HydraNet (Lyu et al., 2020) uses BERT to
encode the question and an individual column, Table 3: Typical methods used for decoding in text-to-
aligning with the tasks BERT is pre-trained on. SQL. The full table and more details are listed in Ta-
After obtaining the BERT representations of all ble 9 in Appendix D. IR: Intermediate Representation.
columns, Lyu et al. (2020) select top-ranked
columns for SQL prediction. Liu et al. (2021b)
train an auxiliary concept prediction module to pre- we group these methods into five main categories
dict which tables and columns correspond to the and other technologies.
question. They detect important question tokens by Tree-based Seq2Tree (Dong and Lapata, 2016)
detecting the largest drop in the confidence score employs a decoder that generates logical forms in a
caused by erasing that token in the question. Lastly, top-down manner. The components in the sub-tree
they train the PLM with a grounding module us- are generated conditioned on their parents apart
ing the question tokens and the corresponding ta- from the input question. Note that the syntax of
bles as well as columns. By empirical studies, Liu the logical forms is implicitly learned from data
et al. (2021b) claim that their approach can awaken for Seq2Tree. Similarly, Seq2AST (Yin and Neu-
the latent grounding from PLM via this erase-and- big, 2017) uses an abstract syntax tree (AST) for
predict technique. decoding the target programming language, where
the syntax is explicitly integrated with AST. Al-
Pre-training There have been various works
though both Seq2Tree (Dong and Lapata, 2016)
proposing different pre-training objectives and us-
and Seq2AST (Yin and Neubig, 2017) do not study
ing different pre-training data to better align the
text-to-SQL datasets, their uses of trees inspire
transformer-based encoder with the text-to-SQL
tree-based decoding in text-to-SQL. SyntaxSQL-
task. For instance, TaBERT (Yin et al., 2020)
Net (Yu et al., 2018b) employs a tree-based decod-
uses tabular data for pre-training with objectives
ing method specific to SQL syntax and recursively
of masked column prediction and cell value recov-
calls modules to predict different SQL components.
ery to pre-train BERT. Grappa (Yu et al., 2021)
synthesizes question-SQL pairs over tables and Sketch-based SQLNet (Xu et al., 2017) designs
pre-trains BERT with the objectives of masked lan- a sketch aligned with the SQL grammar, and SQL-
guage modeling (MLM) and predicting whether a Net only needs to fill in the slots in the sketch
column appears in the SQL query as well as what rather than predict both the output grammar and
SQL operations are triggered. GAP (Shi et al., the content. Besides, the sketch captures the de-
2020a) pre-trains BART (Lewis et al., 2020) on pendency of the predictions. Thus, the prediction
synthesized text-to-SQL and tabular data with the of one slot is only conditioned on the slots it de-
objectives of MLM, column prediction, column pends on, which avoids issues of the same SQL
recovery, and SQL generation. query with varied equivalent serializations. Dong
and Lapata (2018) decompose the decoding into
3.3 Decoding two stages, where the first decoder predicts a rough
Various methods have been proposed for decoding sketch, and the second decoder fills in the low-
to achieve a fine-grained and easier process for level details conditioned on the question and the
SQL generation and bridge the gap between natural sketch. Such coarse-to-fine decoding has also been
language and SQL queries. As shown in Table 3, adopted in other works such as IRNet (Guo et al.,
2019). To address the complex SQL queries with the copy mechanism is also adopted in context-
nested structures, RYANSQL (Choi et al., 2021) dependent text-to-SQL task (Wang et al., 2020b).
recursively yields SELECT statements and uses a
sketch-based slot filling for each of the SELECT Intermediate Representations Researchers use
statements. intermediate representations to bridge the gap be-
tween natural language and SQL queries. Inc-
Bottom-up Both the tree-based and the sketch- SQL (Shi et al., 2018) defines actions for different
based decoding mechanisms can be viewed as SQL components and let decoder decode actions
top-down decoding mechanisms. Rubin and Be- instead of SQL queries. IRNet (Guo et al., 2019)
rant (2021) use a bottom-up decoding mechanism. introduces SemQL, an intermediate representation
Given K trees of height t, the decoder scores trees for SQL queries that can cover most of the chal-
with height t + 1 constructed by SQL grammar lenging Spider benchmark. Specifically, SemQL
from the current beam, and K trees with the high- removes the JOIN ON, FROM and GROUP BY
est scores are kept. Then, a representation of the clauses, merges HAVING and WHERE clause for
new K trees is generated and placed in the new SQL queries. ValueNet (Brunner and Stockinger,
beam. 2021) uses SemQL 2.0, which extends SemQL to
include value representation. Based on SemQL,
Attention Mechanism To integrate the encoder-
NatSQL (Gan et al., 2021c) removes the set op-
side information at decoding, an attention score is
erators 3 . Suhr et al. (2020) implement SemQL
computed and multiplied with hidden vectors from
as a mapping from SQL to a representation with
the encoder to get the context vector, which is then
an under-specified FROM clause, which they call
used to generate an output token (Dong and Lapata,
SQLU F . Rubin and Berant (2021) employ a rela-
2016; Zhong et al., 2017).
tional algebra augmented with SQL operators as
Variants of the attention mechanism have been
the intermediate representations.
used to better propagate the information encoded
However, the intermediate representations are
from questions and DB schemas to the decoder.
usually designed for a specific dataset and cannot
SQLNet (Xu et al., 2017) designs column atten-
be easily adapted to others (Suhr et al., 2020). To
tion, where it uses hidden states from columns
construct a more generalized intermediate represen-
multiplied by embeddings for the question to cal-
tation, Herzig et al. (2021) propose to omit tokens
culate attention scores for a column given the ques-
in the SQL query that do not align to any phrase in
tion. Guo and Gao (2018) incorporate bi-attention
the utterance.
over question and column names for SQL com-
Inspired by the success of text-to-SQL task,
ponent selection. Wang et al. (2019) adopt a
intermediate representations are also studied
structured attention (Kim et al., 2017) by comput-
for SPARQL, another executable language for
ing the marginal probabilities to fill in the slots
database systems (Saparina and Osokin, 2021;
in their generated abstract SQL queries. Duo-
Herzig et al., 2021).
RAT (Scholak et al., 2021a) adopts the relation-
aware self-attention mechanism in both its encoder Others PICARD (Scholak et al., 2021b) and
and decoder. Other works that use sequence-to- UniSAr (Dou et al., 2022) set constraints to the
sequence transformer-based models or decoder- decoder to prevent generating invalid tokens. Sev-
only transformer-based models incorporate the self- eral methods adopt an execution-guided decoding
attention mechanism by default (Scholak et al., mechanism to exclude non-executable partial SQL
2021b; Xie et al., 2022). queries from the output candidates (Wang et al.,
Copy Mechanism Seq2AST (Yin and Neubig, 2018b; Hwang et al., 2019). Global-GNN (Bogin
2017) and Seq2SQL (Zhong et al., 2017) employ et al., 2019b) employs a separately trained discrim-
the pointer network (Vinyals et al., 2015) to com- inative model to rerank the top-K SQL queries
pute the probability of copying words from the in the decoder’s output beam, which is to reason
input. Wang et al. (2018a) use types (e.g., columns, about the complete SQL queries instead of con-
SQL operators, constant from questions) to explic- sidering each word and DB schemas in isolation.
itly restrict locations in the query to copy from Similarly, Kelkar et al. (2020) train a separate dis-
and develop a new training objective to only copy 3
The operators that combine the results of two or more
from the first occurrence in the input. In addition, SELECT statements, such as INTERSECT
criminator to better search among candidate SQL 2012) to learn an auxiliary reward to discount spu-
queries. Xu et al. (2017); Yu et al. (2018b); Guo rious SQL queries in SQL generation. Min et al.
and Gao (2018); Lee (2019) use separate submod- (2019b) model the possible SQL queries as a dis-
ules to predict different SQL components, eas- crete latent variable and adopt a hard-EM-style
ing the difficulty of generating a complete SQL parameter updates, letting their model take advan-
query. Chen et al. (2020b) employ a gate to select tage of the possible pre-computed solutions.
between the output sequence encoded for the ques-
tion and the output sequence from the previous 3.5 Miscellaneous
decoding steps at each step for SQL generation. In- In DB linking, BRIDGE (Lin et al., 2020) appends
spired by machine translation, Müller and Vlachos a representation for the DB cell values mentioned
(2019) apply byte-pair encoding (BPE) (Sennrich in the question to corresponding fields in the en-
et al., 2016) to compress SQL queries to shorter coded sequence, which links the DB content to the
sequences guided by AST, reducing the difficulties question. Ma et al. (2020) employ an explicit ex-
in SQL generation. tractor of slots mentioned in the question and then
link them with DB schemas.
3.4 Learning Techniques
Model-wise, Finegan-Dollak et al. (2018) use a
Apart from end-to-end supervised learning, differ- template-based model which copies slots from the
ent learning techniques have been proposed to help question. Shaw et al. (2021) use a hybrid model
text-to-SQL research. Here we summarize these which firstly uses a high precision grammar-based
learning techniques, each addressing a specific is- approach (NQG) to generate SQL queries, then
sue for the task. uses T5 (Raffel et al., 2019) as a back-up if NQG
Fully supervised Ni et al. (2020) adopt active fails. Yan et al. (2020) formulate submodule slot-
learning to save human annotation. Yao et al. (2019, filling as machine reading comprehension (MRC)
2020); Li et al. (2020b) employ interactive or imi- task and apply BERT-based MRC models on it.
tation learning to enhance text-to-SQL systems via Besides, DT-Fixup (Xu et al., 2021) designs an
interactions with end-users. Huang et al. (2018); optimization approach for a deeper Transformer on
Wang et al. (2021a); Chen et al. (2021a) adopt small datasets for the text-to-SQL task.
meta-learning (Finn et al., 2017) for domain gen- In SQL generation, IncSQL (Shi et al., 2018)
eralization. Various multi-task learning settings allows parsers to explore alternative correct action
have been proposed to improve text-to-SQL mod- sequences to generate different SQL queries. Brun-
els via enhancing their abilities on some relevant ner and Stockinger (2021) search values in DB to
tasks. Chang et al. (2020) set an auxiliary task insert values into SQL query.
of mapping between column and condition values. For context-dependent text-to-SQL, researchers
SeaD (Xuan et al., 2021) integrates two denoising adopt techniques such as turn-level encoder and
objectives to help the model better encode infor- copy mechanism (Suhr et al., 2018; Zhang et al.,
mation from the structural data. Hui et al. (2021b) 2019; Wang et al., 2020b), constrained decod-
integrate a task of learning the correspondence be- ing (Wang et al., 2020b), dynamic memory decay
tween questions and DB schemas. Shi et al. (2021) mechanism (Hui et al., 2021a), treating questions
integrate a column classification task to classify and SQL queries as two modalities, and using bi-
which columns appear in the SQL query. McCann modal pre-trained models (Zheng et al., 2022).
et al. (2018) and Xie et al. (2022) train their models
with other semantic parsing tasks, which improves 4 Evaluation
models’ performance on text-to-SQL task.
Metrics Table 4 shows widely used automatic
Weakly supervised Seq2SQL (Zhong et al., evaluation metrics for the text-to-SQL task. Early
2017) use reinforcement learning to learn WHERE works evaluate SQL queries by comparing the
clause to allow different orders for components in database querying results executed from the pre-
WHERE clause. Liang et al. (2018) leverage mem- dicted SQL query and the ground-truth (or gold)
ory buffer to reduce the variance of policy gradient SQL query (Zelle and Mooney, 1996; Yagh-
estimates when applying reinforcement learning mazadeh et al., 2017) or use exact string match
to text-to-SQL. Agarwal et al. (2019) use meta- to compare the predicted SQL query with the gold
learning and Bayesian optimization (Snoek et al., one query (Finegan-Dollak et al., 2018). However,
Metrics Datasets Errors performance loss when tested against different text-
Naiive Execution GeoQuery, IMDB, False to-SQL datasets from other domains (Suhr et al.,
Accuracy Yelp, WikiSQL, etc positive 2020; Lee et al., 2021). It is unclear how to in-
Advising, WikiSQL, False
Exact String Match corporate domain knowledge to the models trained
etc negative
Exact Set Match Spider
False on Spider and deploy these models efficiently on
negative different domains, especially those with similar in-
Test Suite Accuracy
(execution accuracy Spider, GeoQuery, False formation stored in DB but slightly different DB
with generated etc positive schemas. Although large-scale datasets promote
databases) the cross-domain settings, question-SQL pairs from
Spider are free from domain knowledge, ambiguity,
Table 4: The summary of metrics, datasets that use
these metrics, and their potential error cases. or domain convention. Thus, cross-domain text-to-
SQL needs to be studied in future research to build
a practical cross-domain system that can handle
execution accuracy can create false positives for se-
real-world requests.
mantically different SQL queries even if they yield
the same execution results (Yu et al., 2018c). The There are different use cases in real-world sce-
exact string match can be too strict as two different narios, which requires models to be robust to dif-
strings can still have the same semantics (Zhong ferent settings and be smart to handle different user
et al., 2020a). Aware of these issues, Yu et al. requests. For instance, the model trained with DB
(2018c) adopt exact set match (ESM) in Spider, schemas can need to handle a corrupted table, or no
deciding the correctness of SQL queries by com- table is provided in its practical use. Besides, the
paring the sub-clauses of SQL queries. Zhong et al. input from users can vary from the standard ques-
(2020a) generate databases that can distinguish the tion input in Spider or WikiSQL, which poses chal-
predicted SQL query and gold one. Both methods lenges to models trained on these datasets. More
are used as official metrics on Spider. user studies need to be done to study how well
the current systems serve the end-users and the in-
Evaluation Setup Early single-domain datasets put pattern from the end-users. Apart from SQL
typically use the standard train/dev/test split (Iyer queries, administrators can want to change DB
et al., 2017) by splitting the question-SQL pairs ran- schemas, where a system that can translate the
domly. To evaluate generalization to unseen SQL natural language to such DB commands can be
queries within the current domain, Finegan-Dollak helpful. Also, although there are already works
et al. (2018) propose SQL query split, where no on text-to-SQL beyond English (Min et al., 2019a;
SQL query is allowed to appear in more than one Tuan Nguyen et al., 2020; José and Cozman, 2021),
set among the train, dev, and test sets. Further- but we still lack a comprehensive study on multi-
more, Yu et al. (2018c) propose a database split, lingual text-to-SQL, which can be challenging but
where the model does not see the databases in the useful in real-life scenarios. Finally, it is important
test set in its training time. Other splitting methods to build NLIDB for people with disabilities. Song
also exist to help different research topics (Shaw et al. (2022) propose speech-to-SQL that translates
et al., 2021; Chang et al., 2020). voice input to SQL queries, which helps visually
impaired end users. More work can be done to
5 Discussion and Future Directions address various needs from the perspective of end-
Ever since the LUNAR system (Woods et al., 1972; users, in particular, the needs from minorities.
Woods, 1973), systems for retrieving DB informa- Text-to-SQL research can also be integrated into
tion have witnessed an increasing amount of re- a larger scope of research. Application-wise, Xu
search interest and an enormous growth, especially et al. (2020) develop a question answering system
in the field of text-to-SQL in the deep learning for the database, Chen et al. (2020a) generate task-
era. With the ever-increasing model performance oriented dialogue by retrieving knowledge from the
on the WikiSQL and Spider leaderboards, one can database using the text-to-SQL model. An example
be optimistic because models are becoming more of the possible directions is to employ the text-to-
sophisticated than ever. But there are still several SQL model to query databases for fact-checking.
challenges to overcome. Research-wise, Guo et al. (2020) compare SQL
First, these sophisticated models suffer a great queries to other logical forms in semantic pars-
ing, Xie et al. (2022) include text-to-SQL as one of Stewart for proofreading and suggestions. The
the tasks to achieve a generalized semantic parsing work is funded by the Zhejiang Province Key
framework. The inter-relations between various Project 2022SDXHDX0003.
logical forms in semantic parsing can be further
studied. A generalized framework or a general-
ized model can come as the fruit for our semantic References
parsing community. Rishabh Agarwal, Chen Liang, Dale Schuurmans, and
In hindsight, the development of text-to-SQL Mohammad Norouzi. 2019. Learning to generalize
from sparse and underspecified rewards. In Pro-
has been pushed by the innovation in the general ceedings of the 36th International Conference on
ML/NLP community, such as LSTM (Hochreiter Machine Learning, ICML 2019, 9-15 June 2019,
and Schmidhuber, 1997), self-attention (Vaswani Long Beach, California, USA, volume 97 of Pro-
et al., 2017), PLMs (Devlin et al., 2019), etc. Re- ceedings of Machine Learning Research, pages 130–
140. PMLR.
cently, prompt learning has achieved decent perfor-
mance on various tasks, in particular, in the low- Núria Bertomeu, Hans Uszkoreit, Anette Frank, Hans-
resource setting (Liu et al., 2021a). Such charac- Ulrich Krieger, and Brigitte Jörg. 2006. Contextual
phenomena and thematic relations in database QA
teristics align well with the expectation of having a
dialogues: results from a Wizard-of-Oz experiment.
functional text-to-SQL model with a few training In Proceedings of the Interactive Question Answer-
samples. Some recent works already explore apply- ing Workshop at HLT-NAACL 2006, pages 1–8, New
ing prompt learning to the text-to-SQL task (Xie York, NY, USA. Association for Computational Lin-
et al., 2022). The practical expectation for the guistics.
text-to-SQL task is to deploy the model in differ- Shikhar Bharadwaj and Shirish Shevade. 2022. Effi-
ent scenarios, requiring robustness across domains. cient constituency tree based encoding for natural
However, prompt learning struggles with being ro- language to bash translation. In Proceedings of the
2022 Conference of the North American Chapter of
bust, and the performance can be easily affected the Association for Computational Linguistics: Hu-
by the selected data. This misalignment encour- man Language Technologies, pages 3159–3168.
ages researchers to study how to employ prompt
Ben Bogin, Jonathan Berant, and Matt Gardner. 2019a.
learning in the real-world text-to-SQL task, which
Representing schema structure with graph neural
can need further understanding of the cross-domain networks for text-to-SQL parsing. In Proceedings of
challenges for text-to-SQL. the 57th Annual Meeting of the Association for Com-
Another line of research is to evaluate these so- putational Linguistics, pages 4560–4565, Florence,
Italy. Association for Computational Linguistics.
phisticated text-to-SQL systems. The typical mea-
sure is to evaluate the performance of the system Ben Bogin, Matt Gardner, and Jonathan Berant. 2019b.
on some existing datasets. As there are operational Global reasoning over database structures for text-
systems using NL input to perform tasks such as to-SQL parsing. In Proceedings of the 2019 Con-
ference on Empirical Methods in Natural Language
getting answers from database management system Processing and the 9th International Joint Confer-
or building ontologies or playing some games, the ence on Natural Language Processing (EMNLP-
performance of these systems can be measured by IJCNLP), pages 3659–3664, Hong Kong, China. As-
the diminution of the (human) time taken to get sociation for Computational Linguistics.
the searched information (Deng et al., 2021; Zhou Sridevi Bonthu, S Rama Sree, and MHM Kr-
et al., 2022). While there are context-dependent ishna Prasad. 2021. Text2PyCode: Machine transla-
text-to-SQL datasets available (Yu et al., 2019a,b), tion of natural language intent to python source code.
In International Cross-Domain Conference for Ma-
researchers can draw inspirations from other fields chine Learning and Knowledge Extraction, pages
of research (Zellers et al., 2021) to design interac- 51–60. Springer.
tive set-ups to evaluate text-to-SQL systems. Ap-
Ursin Brunner and Kurt Stockinger. 2021. Valuenet:
pendix E discusses tasks relevant to the task of
A natural language-to-SQL system that learns from
text-to-SQL. database information. In 2021 IEEE 37th Inter-
national Conference on Data Engineering (ICDE),
Acknowledgement pages 2177–2182. IEEE.
Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang
Yue Zhang is the corresponding author. We thank Tseng, Iñigo Casanueva, Stefan Ultes, Osman Ra-
all reviewers for their insightful comments, and madan, and Milica Gašić. 2018. MultiWOZ - a
Rada Mihalcea, Siqi Shen, Winston Wu and Ian large-scale multi-domain Wizard-of-Oz dataset for
task-oriented dialogue modelling. In Proceedings of DongHyun Choi, Myeong Cheol Shin, EungGyun Kim,
the 2018 Conference on Empirical Methods in Nat- and Dong Ryeol Shin. 2021. RYANSQL: Recur-
ural Language Processing, pages 5016–5026, Brus- sively applying sketch-based slot fillings for com-
sels, Belgium. Association for Computational Lin- plex text-to-SQL in cross-domain databases. Com-
guistics. putational Linguistics, 47(2):309–332.
Ruichu Cai, Jinjie Yuan, Boyan Xu, and Zhifeng Hao. E. F. Codd. 1970. A relational model of data for large
2021. SADGA: Structure-aware dual graph aggre- shared data banks. Commun. ACM, 13(6):377–387.
gation network for text-to-SQL. Advances in Neural
Information Processing Systems, 34. Deborah A. Dahl, Madeleine Bates, Michael Brown,
William Fisher, Kate Hunicke-Smith, David Pallett,
Yitao Cai and Xiaojun Wan. 2020. IGSQL: Database Christine Pao, Alexander Rudnicky, and Elizabeth
schema interaction graph based neural model for Shriberg. 1994. Expanding the scope of the ATIS
context-dependent text-to-SQL generation. In Pro- task: The ATIS-3 corpus. In Human Language Tech-
ceedings of the 2020 Conference on Empirical Meth- nology: Proceedings of a Workshop held at Plains-
ods in Natural Language Processing (EMNLP), boro, New Jersey, March 8-11, 1994.
pages 6903–6912, Online. Association for Compu-
tational Linguistics. Naihao Deng, Shuaichen Chang, Peng Shi, Tao Yu,
and Rui Zhang. 2021. Prefix-to-SQL: Text-to-SQL
Ruisheng Cao, Lu Chen, Zhi Chen, Yanbin Zhao, generation from incomplete user questions. arXiv
Su Zhu, and Kai Yu. 2021. LGESQL: Line graph en- preprint arXiv:2109.13066.
hanced text-to-SQL model with mixed local and non-
local relations. In Proceedings of the 59th Annual Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
Meeting of the Association for Computational Lin- Kristina Toutanova. 2019. BERT: Pre-training of
guistics and the 11th International Joint Conference deep bidirectional transformers for language under-
on Natural Language Processing (Volume 1: Long standing. In Proceedings of the 2019 Conference
Papers), pages 2541–2555, Online. Association for of the North American Chapter of the Association
Computational Linguistics. for Computational Linguistics: Human Language
Shuaichen Chang, Pengfei Liu, Yun Tang, Jing Huang, Technologies, Volume 1 (Long and Short Papers),
Xiaodong He, and Bowen Zhou. 2020. Zero-shot pages 4171–4186, Minneapolis, Minnesota. Associ-
text-to-SQL learning with auxiliary task. In Pro- ation for Computational Linguistics.
ceedings of the AAAI Conference on Artificial Intel- Li Dong and Mirella Lapata. 2016. Language to logi-
ligence, volume 34, pages 7488–7495. cal form with neural attention. In Proceedings of the
Chieh-Yang Chen, Pei-Hsin Wang, Shih-Chieh Chang, 54th Annual Meeting of the Association for Compu-
Da-Cheng Juan, Wei Wei, and Jia-Yu Pan. 2020a. tational Linguistics (Volume 1: Long Papers), pages
AirConcierge: Generating task-oriented dialogue 33–43, Berlin, Germany. Association for Computa-
via efficient large-scale knowledge retrieval. In tional Linguistics.
Findings of the Association for Computational Lin-
guistics: EMNLP 2020, pages 884–897, Online. As- Li Dong and Mirella Lapata. 2018. Coarse-to-fine de-
sociation for Computational Linguistics. coding for neural semantic parsing. In Proceedings
of the 56th Annual Meeting of the Association for
Sanxing Chen, Aidan San, Xiaodong Liu, and Computational Linguistics (Volume 1: Long Papers),
Yangfeng Ji. 2020b. A tale of two linkings: Dy- pages 731–742, Melbourne, Australia. Association
namically gating between schema linking and struc- for Computational Linguistics.
tural linking for text-to-SQL parsing. In Proceed-
ings COLING-2020, the 28th International Confer- Longxu Dou, Yan Gao, Mingyang Pan, Dingzirui
ence on Computational Linguistics, pages 2900– Wang, Jian-Guang Lou, Wanxiang Che, and Dechen
2912, Barcelona, Spain (Online). Association for Zhan. 2022. UniSAr: A unified structure-aware au-
Computational Linguistics. toregressive language model for text-to-SQL. ArXiv
preprint, abs/2203.07781.
Yongrui Chen, Xinnan Guo, Chaojie Wang, Jian
Qiu, Guilin Qi, Meng Wang, and Huiying Li. Ahmed Elgohary, Saghar Hosseini, and Ahmed Has-
2021a. Leveraging table content for zero-shot san Awadallah. 2020. Speak to your parser: Interac-
text-to-SQL with meta-learning. ArXiv preprint, tive text-to-SQL with natural language feedback. In
abs/2109.05395. Proceedings of the 58th Annual Meeting of the Asso-
ciation for Computational Linguistics, pages 2065–
Zhi Chen, Lu Chen, Yanbin Zhao, Ruisheng Cao, Zi- 2077, Online. Association for Computational Lin-
han Xu, Su Zhu, and Kai Yu. 2021b. ShadowGNN: guistics.
Graph projection neural network for text-to-SQL
parser. In Proceedings of the 2021 Conference of Catherine Finegan-Dollak, Jonathan K. Kummerfeld,
the North American Chapter of the Association for Li Zhang, Karthik Ramanathan, Sesh Sadasivam,
Computational Linguistics: Human Language Tech- Rui Zhang, and Dragomir Radev. 2018. Improving
nologies, pages 5567–5577, Online. Association for text-to-SQL evaluation methodology. In Proceed-
Computational Linguistics. ings of the 56th Annual Meeting of the Association
for Computational Linguistics (Volume 1: Long Pa- on Empirical Methods in Natural Language Process-
pers), pages 351–360, Melbourne, Australia. Asso- ing (EMNLP), pages 1520–1540, Online. Associa-
ciation for Computational Linguistics. tion for Computational Linguistics.
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Jiaqi Guo, Ziliang Si, Yu Wang, Qian Liu, Ming Fan,
Model-agnostic meta-learning for fast adaptation of Jian-Guang Lou, Zijiang Yang, and Ting Liu. 2021.
deep networks. In Proceedings of the 34th Inter- Chase: A large-scale and pragmatic Chinese dataset
national Conference on Machine Learning, ICML for cross-database context-dependent text-to-SQL.
2017, Sydney, NSW, Australia, 6-11 August 2017, In Proceedings of the 59th Annual Meeting of the
volume 70 of Proceedings of Machine Learning Re- Association for Computational Linguistics and the
search, pages 1126–1135. PMLR. 11th International Joint Conference on Natural Lan-
guage Processing (Volume 1: Long Papers), pages
Yujian Gan, Xinyun Chen, Qiuping Huang, Matthew 2316–2331, Online. Association for Computational
Purver, John R. Woodward, Jinxia Xie, and Peng- Linguistics.
sheng Huang. 2021a. Towards robustness of text-to-
SQL models against synonym substitution. In Pro- Jiaqi Guo, Zecheng Zhan, Yan Gao, Yan Xiao,
ceedings of the 59th Annual Meeting of the Associa- Jian-Guang Lou, Ting Liu, and Dongmei Zhang.
tion for Computational Linguistics and the 11th In- 2019. Towards complex text-to-SQL in cross-
ternational Joint Conference on Natural Language domain database with intermediate representation.
Processing (Volume 1: Long Papers), pages 2505– In Proceedings of the 57th Annual Meeting of the
2515, Online. Association for Computational Lin- Association for Computational Linguistics, pages
guistics. 4524–4535, Florence, Italy. Association for Compu-
tational Linguistics.
Yujian Gan, Xinyun Chen, and Matthew Purver.
2021b. Exploring underexplored limitations of Tong Guo and Huilin Gao. 2018. Bidirectional
cross-domain text-to-SQL generalization. In Pro- attention for SQL generation. ArXiv preprint,
ceedings of the 2021 Conference on Empirical Meth- abs/1801.00076.
ods in Natural Language Processing, pages 8926–
8931, Online and Punta Cana, Dominican Republic. Tong Guo and Huilin Gao. 2019. Content en-
Association for Computational Linguistics. hanced BERT-based text-to-SQL generation. ArXiv
preprint, abs/1910.07179.
Yujian Gan, Xinyun Chen, Jinxia Xie, Matthew Purver,
John R. Woodward, John Drake, and Qiaofu Zhang. Moshe Hazoom, Vibhor Malik, and Ben Bogin. 2021.
2021c. Natural SQL: Making SQL easier to infer Text-to-SQL in the wild: A naturally-occurring
from natural language specifications. In Findings dataset based on stack exchange data. In Proceed-
of the Association for Computational Linguistics: ings of the 1st Workshop on Natural Language Pro-
EMNLP 2021, pages 2030–2042, Punta Cana, Do- cessing for Programming (NLP4Prog 2021), pages
minican Republic. Association for Computational 77–87, Online. Association for Computational Lin-
Linguistics. guistics.
Juri Ganitkevitch, Benjamin Van Durme, and Chris Pengcheng He, Yi Mao, Kaushik Chakrabarti, and
Callison-Burch. 2013. PPDB: The paraphrase Weizhu Chen. 2019. X-SQL: reinforce schema
database. In Proceedings of the 2013 Conference of representation with context. ArXiv preprint,
the North American Chapter of the Association for abs/1908.08113.
Computational Linguistics: Human Language Tech-
nologies, pages 758–764, Atlanta, Georgia. Associa- Charles T. Hemphill, John J. Godfrey, and George R.
tion for Computational Linguistics. Doddington. 1990. The ATIS spoken language sys-
tems pilot corpus. In Speech and Natural Language:
Jonathan L Gross, Jay Yellen, and Mark Anderson. Proceedings of a Workshop Held at Hidden Valley,
2018. Graph theory and its applications. Chapman Pennsylvania, June 24-27,1990.
and Hall/CRC.
Jonathan Herzig, Peter Shaw, Ming-Wei Chang, Kelvin
Daya Guo, Yibo Sun, Duyu Tang, Nan Duan, Jian Yin, Guu, Panupong Pasupat, and Yuan Zhang. 2021. Un-
Hong Chi, James Cao, Peng Chen, and Ming Zhou. locking compositional generalization in pre-trained
2018. Question generation from SQL queries im- models using intermediate representations. ArXiv
proves neural semantic parsing. In Proceedings of preprint, abs/2104.07478.
the 2018 Conference on Empirical Methods in Nat-
ural Language Processing, pages 1597–1607, Brus- Sepp Hochreiter and Jürgen Schmidhuber. 1997.
sels, Belgium. Association for Computational Lin- Long short-term memory. Neural computation,
guistics. 9(8):1735–1780.
Jiaqi Guo, Qian Liu, Jian-Guang Lou, Zhenwen Li, Po-Sen Huang, Chenglong Wang, Rishabh Singh, Wen-
Xueqing Liu, Tao Xie, and Ting Liu. 2020. Bench- tau Yih, and Xiaodong He. 2018. Natural language
marking meaning representations in neural seman- to structured query generation via meta-learning. In
tic parsing. In Proceedings of the 2020 Conference Proceedings of the 2018 Conference of the North
American Chapter of the Association for Compu- Chia-Hsuan Lee, Oleksandr Polozov, and Matthew
tational Linguistics: Human Language Technolo- Richardson. 2021. KaggleDBQA: Realistic evalu-
gies, Volume 2 (Short Papers), pages 732–738, New ation of text-to-SQL parsers. In Proceedings of the
Orleans, Louisiana. Association for Computational 59th Annual Meeting of the Association for Compu-
Linguistics. tational Linguistics and the 11th International Joint
Conference on Natural Language Processing (Vol-
Binyuan Hui, Ruiying Geng, Qiyu Ren, Binhua Li, ume 1: Long Papers), pages 2261–2273, Online. As-
Yongbin Li, Jian Sun, Fei Huang, Luo Si, Pengfei sociation for Computational Linguistics.
Zhu, and Xiaodan Zhu. 2021a. Dynamic hybrid re-
lation network for cross-domain context-dependent Dongjun Lee. 2019. Clause-wise and recursive decod-
semantic parsing. ArXiv preprint, abs/2101.01686. ing for complex and cross-domain text-to-SQL gen-
eration. In Proceedings of the 2019 Conference on
Binyuan Hui, Ruiying Geng, Lihan Wang, Bowen Empirical Methods in Natural Language Processing
Qin, Bowen Li, Jian Sun, and Yongbin Li. 2022. and the 9th International Joint Conference on Natu-
S2 SQL: Injecting syntax to question-schema inter- ral Language Processing (EMNLP-IJCNLP), pages
action graph encoder for text-to-SQL parsers. ArXiv 6045–6051, Hong Kong, China. Association for
preprint, abs/2203.06958. Computational Linguistics.
Binyuan Hui, Xiang Shi, Ruiying Geng, Binhua Li, Wenqiang Lei, Weixin Wang, Zhixin Ma, Tian Gan,
Yongbin Li, Jian Sun, and Xiaodan Zhu. 2021b. Im- Wei Lu, Min-Yen Kan, and Tat-Seng Chua. 2020.
proving text-to-SQL with schema dependency learn- Re-examining the role of schema linking in text-to-
ing. ArXiv preprint, abs/2103.04399. SQL. In Proceedings of the 2020 Conference on
Empirical Methods in Natural Language Process-
Wonseok Hwang, Jinyeong Yim, Seunghyun Park, and ing (EMNLP), pages 6943–6954, Online. Associa-
Minjoon Seo. 2019. A comprehensive exploration tion for Computational Linguistics.
on WikiSQL with table-aware word contextualiza-
tion. ArXiv preprint, abs/1902.01069. Mike Lewis, Yinhan Liu, Naman Goyal, Mar-
jan Ghazvininejad, Abdelrahman Mohamed, Omer
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Levy, Veselin Stoyanov, and Luke Zettlemoyer.
Krishnamurthy, and Luke Zettlemoyer. 2017. Learn- 2020. BART: Denoising sequence-to-sequence pre-
ing a neural semantic parser from user feedback. In training for natural language generation, translation,
Proceedings of the 55th Annual Meeting of the As- and comprehension. In Proceedings of the 58th An-
sociation for Computational Linguistics (Volume 1: nual Meeting of the Association for Computational
Long Papers), pages 963–973, Vancouver, Canada. Linguistics, pages 7871–7880, Online. Association
Association for Computational Linguistics. for Computational Linguistics.
Tanzim Mahmud, KM Azharul Hasan, Mahtab Ahmed, Karthik Radhakrishnan, Arvind Srikantan, and Xi Vic-
and Thwoi Hla Ching Chak. 2015. A rule based ap- toria Lin. 2020. ColloQL: Robust cross-domain
proach for NLP based query processing. In 2015 text-to-SQL over search queries. ArXiv preprint,
2nd International Conference on Electrical Infor- abs/2010.09927.
mation and Communication Technologies (EICT),
pages 78–82. IEEE. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine
Lee, Sharan Narang, Michael Matena, Yanqi Zhou,
Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, Wei Li, and Peter J Liu. 2019. Exploring the limits
and Richard Socher. 2018. The natural language de- of transfer learning with a unified text-to-text trans-
cathlon: Multitask learning as question answering. former. ArXiv preprint, abs/1910.10683.
ArXiv preprint, abs/1806.08730.
Ohad Rubin and Jonathan Berant. 2021. SmBoP:
Qingkai Min, Yuefeng Shi, and Yue Zhang. 2019a. A Semi-autoregressive bottom-up semantic parsing. In
pilot study for Chinese SQL semantic parsing. In Proceedings of the 2021 Conference of the North
Proceedings of the 2019 Conference on Empirical American Chapter of the Association for Computa-
Methods in Natural Language Processing and the tional Linguistics: Human Language Technologies,
9th International Joint Conference on Natural Lan- pages 311–324, Online. Association for Computa-
guage Processing (EMNLP-IJCNLP), pages 3652– tional Linguistics.
3658, Hong Kong, China. Association for Computa-
tional Linguistics. Irina Saparina and Anton Osokin. 2021. SPARQLing
database queries from intermediate question decom-
Sewon Min, Danqi Chen, Hannaneh Hajishirzi, and positions. In Proceedings of the 2021 Conference on
Luke Zettlemoyer. 2019b. A discrete hard EM ap- Empirical Methods in Natural Language Processing,
proach for weakly supervised question answering. pages 8984–8998, Online and Punta Cana, Domini-
In Proceedings of the 2019 Conference on Empirical can Republic. Association for Computational Lin-
Methods in Natural Language Processing and the guistics.
Torsten Scholak, Raymond Li, Dzmitry Bahdanau, 2012. Proceedings of a meeting held December 3-
Harm de Vries, and Chris Pal. 2021a. DuoRAT: To- 6, 2012, Lake Tahoe, Nevada, United States, pages
wards simpler text-to-SQL models. In Proceedings 2960–2968.
of the 2021 Conference of the North American Chap-
ter of the Association for Computational Linguistics: Yuanfeng Song, Raymond Chi-Wing Wong, Xuefang
Human Language Technologies, pages 1313–1321, Zhao, and Di Jiang. 2022. Speech-to-SQL: Towards
Online. Association for Computational Linguistics. speech-driven SQL query generation from natural
language question. ArXiv preprint, abs/2201.01209.
Torsten Scholak, Nathan Schucher, and Dzmitry Bah-
danau. 2021b. PICARD: Parsing incrementally for Alane Suhr, Ming-Wei Chang, Peter Shaw, and Ken-
constrained auto-regressive decoding from language ton Lee. 2020. Exploring unexplored generalization
models. In Proceedings of the 2021 Conference on challenges for cross-database semantic parsing. In
Empirical Methods in Natural Language Processing, Proceedings of the 58th Annual Meeting of the Asso-
pages 9895–9901, Online and Punta Cana, Domini- ciation for Computational Linguistics, pages 8372–
can Republic. Association for Computational Lin- 8388, Online. Association for Computational Lin-
guistics. guistics.
Rico Sennrich, Barry Haddow, and Alexandra Birch. Alane Suhr, Srinivasan Iyer, and Yoav Artzi. 2018.
2016. Neural machine translation of rare words Learning to map context-dependent sentences to ex-
with subword units. In Proceedings of the 54th An- ecutable formal queries. In Proceedings of the 2018
nual Meeting of the Association for Computational Conference of the North American Chapter of the
Linguistics (Volume 1: Long Papers), pages 1715– Association for Computational Linguistics: Human
1725, Berlin, Germany. Association for Computa- Language Technologies, Volume 1 (Long Papers),
tional Linguistics. pages 2238–2249, New Orleans, Louisiana. Associ-
ation for Computational Linguistics.
Peter Shaw, Ming-Wei Chang, Panupong Pasupat, and
Kristina Toutanova. 2021. Compositional general- Ningyuan Sun, Xuefeng Yang, and Yunfeng Liu. 2020.
ization and natural language variation: Can a se- Tableqa: a large-scale Chinese text-to-SQL dataset
mantic parsing approach handle both? In Proceed- for table-aware SQL generation. ArXiv preprint,
ings of the 59th Annual Meeting of the Association abs/2006.06434.
for Computational Linguistics and the 11th Interna-
tional Joint Conference on Natural Language Pro- Lappoon R. Tang and Raymond J. Mooney. 2000. Au-
cessing (Volume 1: Long Papers), pages 922–938, tomated construction of database interfaces: Inter-
Online. Association for Computational Linguistics. grating statistical and relational learning for seman-
tic parsing. In 2000 Joint SIGDAT Conference on
Peng Shi, Patrick Ng, Zhiguo Wang, Henghui Empirical Methods in Natural Language Process-
Zhu, Alexander Hanbo Li, Jun Wang, Cicero ing and Very Large Corpora, pages 133–141, Hong
Nogueira dos Santos, and Bing Xiang. 2020a. Kong, China. Association for Computational Lin-
Learning contextual representations for seman- guistics.
tic parsing with generation-augmented pre-training.
ArXiv preprint, abs/2012.10309. Yasufumi Taniguchi, Hiroki Nakayama, Kubo
Takahiro, and Jun Suzuki. 2021. An investiga-
Peng Shi, Tao Yu, Patrick Ng, and Zhiguo Wang. tion between schema linking and text-to-SQL
2021. End-to-end cross-domain text-to-SQL se- performance. ArXiv preprint, abs/2102.01847.
mantic parsing with auxiliary task. ArXiv preprint,
abs/2106.09588. Anh Tuan Nguyen, Mai Hoang Dao, and Dat Quoc
Nguyen. 2020. A pilot study of text-to-SQL seman-
Tianze Shi, Kedar Tatwawadi, Kaushik Chakrabarti, tic parsing for Vietnamese. In Findings of the As-
Yi Mao, Oleksandr Polozov, and Weizhu Chen. 2018. sociation for Computational Linguistics: EMNLP
IncSQL: Training incremental text-to-SQL parsers 2020, pages 4079–4085, Online. Association for
with non-deterministic oracles. ArXiv preprint, Computational Linguistics.
abs/1809.05054.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
Tianze Shi, Chen Zhao, Jordan Boyd-Graber, Hal Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz
Daumé III, and Lillian Lee. 2020b. On the poten- Kaiser, and Illia Polosukhin. 2017. Attention is all
tial of lexico-logical alignments for semantic pars- you need. In Advances in Neural Information Pro-
ing to SQL queries. In Findings of the Association cessing Systems 30: Annual Conference on Neural
for Computational Linguistics: EMNLP 2020, pages Information Processing Systems 2017, December 4-
1849–1864, Online. Association for Computational 9, 2017, Long Beach, CA, USA, pages 5998–6008.
Linguistics.
Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly.
Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. 2015. Pointer networks. In Advances in Neural
2012. Practical bayesian optimization of machine Information Processing Systems 28: Annual Con-
learning algorithms. In Advances in Neural Infor- ference on Neural Information Processing Systems
mation Processing Systems 25: 26th Annual Con- 2015, December 7-12, 2015, Montreal, Quebec,
ference on Neural Information Processing Systems Canada, pages 2692–2700.
Bailin Wang, Mirella Lapata, and Ivan Titov. 2021a. W. Woods, Ronald Kaplan, and Bonnie Webber. 1972.
Meta-learning for domain generalization in seman- The lunar sciences natural language information sys-
tic parsing. In Proceedings of the 2021 Conference tem.
of the North American Chapter of the Association
for Computational Linguistics: Human Language William A Woods. 1973. Progress in natural language
Technologies, pages 366–379, Online. Association understanding: an application to lunar geology. In
for Computational Linguistics. Proceedings of the June 4-8, 1973, national com-
puter conference and exposition, pages 441–450.
Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr
Kun Wu, Lijie Wang, Zhenghua Li, Ao Zhang, Xinyan
Polozov, and Matthew Richardson. 2020a. RAT-
Xiao, Hua Wu, Min Zhang, and Haifeng Wang.
SQL: Relation-aware schema encoding and linking
2021. Data augmentation with hierarchical SQL-
for text-to-SQL parsers. In Proceedings of the 58th
to-question generation for cross-domain text-to-SQL
Annual Meeting of the Association for Computa-
parsing. In Proceedings of the 2021 Conference on
tional Linguistics, pages 7567–7578, Online. Asso-
Empirical Methods in Natural Language Processing,
ciation for Computational Linguistics.
pages 8974–8983, Online and Punta Cana, Domini-
Bailin Wang, Ivan Titov, and Mirella Lapata. 2019. can Republic. Association for Computational Lin-
Learning semantic parsers from denotations with la- guistics.
tent structured alignments and abstract programs. In Tianbao Xie, Chen Henry Wu, Peng Shi, Ruiqi Zhong,
Proceedings of the 2019 Conference on Empirical Torsten Scholak, Michihiro Yasunaga, Chien-Sheng
Methods in Natural Language Processing and the Wu, Ming Zhong, Pengcheng Yin, Sida I Wang, et al.
9th International Joint Conference on Natural Lan- 2022. UnifiedSKG: Unifying and multi-tasking
guage Processing (EMNLP-IJCNLP), pages 3774– structured knowledge grounding with text-to-text
3785, Hong Kong, China. Association for Computa- language models. ArXiv preprint, abs/2201.05966.
tional Linguistics.
Peng Xu, Dhruv Kumar, Wei Yang, Wenjie Zi, Keyi
Bailin Wang, Wenpeng Yin, Xi Victoria Lin, and Caim- Tang, Chenyang Huang, Jackie Chi Kit Cheung, Si-
ing Xiong. 2021b. Learning to synthesize data for mon J.D. Prince, and Yanshuai Cao. 2021. Optimiz-
semantic parsing. In Proceedings of the 2021 Con- ing deeper transformers on small datasets. In Pro-
ference of the North American Chapter of the Asso- ceedings of the 59th Annual Meeting of the Associa-
ciation for Computational Linguistics: Human Lan- tion for Computational Linguistics and the 11th In-
guage Technologies, pages 2760–2766, Online. As- ternational Joint Conference on Natural Language
sociation for Computational Linguistics. Processing (Volume 1: Long Papers), pages 2089–
2102, Online. Association for Computational Lin-
Chenglong Wang, Marc Brockschmidt, and Rishabh guistics.
Singh. 2018a. Pointing out SQL queries from text.
Silei Xu, Sina Semnani, Giovanni Campagna, and
Chenglong Wang, Kedar Tatwawadi, Marc Monica Lam. 2020. AutoQA: From databases to QA
Brockschmidt, Po-Sen Huang, Yi Mao, Olek- semantic parsers with only synthetic training data.
sandr Polozov, and Rishabh Singh. 2018b. Robust In Proceedings of the 2020 Conference on Empirical
text-to-SQL generation with execution-guided Methods in Natural Language Processing (EMNLP),
decoding. ArXiv preprint, abs/1807.03100. pages 422–434, Online. Association for Computa-
tional Linguistics.
Huajie Wang, Mei Li, and Lei Chen. 2020b. PG-
GSQL: Pointer-generator network with guide de- Xiaojun Xu, Chang Liu, and Dawn Song. 2017. SQL-
coding for cross-domain context-dependent text-to- Net: Generating structured queries from natural
SQL generation. In Proceedings of COLING-2022, language without reinforcement learning. ArXiv
the 28th International Conference on Computational preprint, abs/1711.04436.
Linguistics, pages 370–380, Barcelona, Spain (On- Kuan Xuan, Yongbo Wang, Yongliang Wang, Zujie
line). Association for Computational Linguistics. Wen, and Yang Dong. 2021. Sead: End-to-end
text-to-SQL generation with schema-aware denois-
Lijie Wang, Ao Zhang, Kun Wu, Ke Sun, Zhenghua
ing. ArXiv preprint, abs/2105.07911.
Li, Hua Wu, Min Zhang, and Haifeng Wang. 2020c.
DuSQL: A large-scale and pragmatic Chinese text- Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and
to-SQL dataset. In Proceedings of the 2020 Con- Thomas Dillig. 2017. SQLizer: query synthesis
ference on Empirical Methods in Natural Language from natural language. Proceedings of the ACM on
Processing (EMNLP), pages 6923–6935, Online. As- Programming Languages, 1(OOPSLA):1–26.
sociation for Computational Linguistics.
Zeyu Yan, Jianqiang Ma, Yang Zhang, and Jianping
Ping Wang, Tian Shi, and Chandan K. Reddy. 2020d. Shen. 2020. SQL generation via machine reading
Text-to-SQL generation for question answering on comprehension. In Proceedings of COLING-2022,
electronic medical records. In WWW ’20: The Web the 28th International Conference on Computational
Conference 2020, Taipei, Taiwan, April 20-24, 2020, Linguistics, pages 350–356, Barcelona, Spain (On-
pages 350–361. ACM / IW3C2. line). Association for Computational Linguistics.
Ziyu Yao, Yu Su, Huan Sun, and Wen-tau Yih. 2019. Tao Yu, Rui Zhang, Heyang Er, Suyi Li, Eric Xue,
Model-based interactive semantic parsing: A uni- Bo Pang, Xi Victoria Lin, Yi Chern Tan, Tianze
fied framework and a text-to-SQL case study. In Shi, Zihan Li, Youxuan Jiang, Michihiro Yasunaga,
Proceedings of the 2019 Conference on Empirical Sungrok Shim, Tao Chen, Alexander Fabbri, Zifan
Methods in Natural Language Processing and the Li, Luyao Chen, Yuwen Zhang, Shreya Dixit, Vin-
9th International Joint Conference on Natural Lan- cent Zhang, Caiming Xiong, Richard Socher, Walter
guage Processing (EMNLP-IJCNLP), pages 5447– Lasecki, and Dragomir Radev. 2019a. CoSQL: A
5458, Hong Kong, China. Association for Computa- conversational text-to-SQL challenge towards cross-
tional Linguistics. domain natural language interfaces to databases. In
Proceedings of the 2019 Conference on Empirical
Ziyu Yao, Yiqi Tang, Wen-tau Yih, Huan Sun, and Methods in Natural Language Processing and the
Yu Su. 2020. An imitation game for learning se- 9th International Joint Conference on Natural Lan-
mantic parsers from user interaction. In Proceed- guage Processing (EMNLP-IJCNLP), pages 1962–
ings of the 2020 Conference on Empirical Methods 1979, Hong Kong, China. Association for Computa-
in Natural Language Processing (EMNLP), pages tional Linguistics.
6883–6902, Online. Association for Computational
Linguistics. Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga,
Dongxu Wang, Zifan Li, James Ma, Irene Li,
Xi Ye, Qiaochu Chen, Xinyu Wang, Isil Dillig, and Qingning Yao, Shanelle Roman, Zilin Zhang,
Greg Durrett. 2020. Sketch-driven regular expres- and Dragomir Radev. 2018c. Spider: A large-
sion generation from natural language and examples. scale human-labeled dataset for complex and cross-
Transactions of the Association for Computational domain semantic parsing and text-to-SQL task. In
Linguistics, 8:679–694. Proceedings of the 2018 Conference on Empirical
Methods in Natural Language Processing, pages
Pengcheng Yin and Graham Neubig. 2017. A syntactic 3911–3921, Brussels, Belgium. Association for
neural model for general-purpose code generation. Computational Linguistics.
In Proceedings of the 55th Annual Meeting of the As-
sociation for Computational Linguistics (Volume 1: Tao Yu, Rui Zhang, Michihiro Yasunaga, Yi Chern
Long Papers), pages 440–450, Vancouver, Canada. Tan, Xi Victoria Lin, Suyi Li, Heyang Er, Irene
Association for Computational Linguistics. Li, Bo Pang, Tao Chen, Emily Ji, Shreya Dixit,
David Proctor, Sungrok Shim, Jonathan Kraft, Vin-
Pengcheng Yin, Graham Neubig, Wen-tau Yih, and Se- cent Zhang, Caiming Xiong, Richard Socher, and
bastian Riedel. 2020. TaBERT: Pretraining for joint Dragomir Radev. 2019b. SParC: Cross-domain se-
understanding of textual and tabular data. In Pro- mantic parsing in context. In Proceedings of the
ceedings of the 58th Annual Meeting of the Asso- 57th Annual Meeting of the Association for Com-
ciation for Computational Linguistics, pages 8413– putational Linguistics, pages 4511–4523, Florence,
8426, Online. Association for Computational Lin- Italy. Association for Computational Linguistics.
guistics.
John M Zelle and Raymond J Mooney. 1996. Learn-
Tao Yu, Zifan Li, Zilin Zhang, Rui Zhang, and ing to parse database queries using inductive logic
Dragomir Radev. 2018a. TypeSQL: Knowledge- programming. In Proceedings of the national con-
based type-aware neural text-to-SQL generation. In ference on artificial intelligence, pages 1050–1055.
Proceedings of the 2018 Conference of the North
American Chapter of the Association for Compu- Rowan Zellers, Ari Holtzman, Elizabeth Clark, Lianhui
tational Linguistics: Human Language Technolo- Qin, Ali Farhadi, and Yejin Choi. 2021. TuringAd-
gies, Volume 2 (Short Papers), pages 588–594, New vice: A generative and dynamic evaluation of lan-
Orleans, Louisiana. Association for Computational guage use. In Proceedings of the 2021 Conference of
Linguistics. the North American Chapter of the Association for
Computational Linguistics: Human Language Tech-
Tao Yu, Chien-Sheng Wu, Xi Victoria Lin, Bailin nologies, pages 4856–4880, Online. Association for
Wang, Yi Chern Tan, Xinyi Yang, Dragomir R. Computational Linguistics.
Radev, Richard Socher, and Caiming Xiong. 2021.
Grappa: Grammar-augmented pre-training for table Jichuan Zeng, Xi Victoria Lin, Steven C.H. Hoi,
semantic parsing. In 9th International Conference Richard Socher, Caiming Xiong, Michael Lyu, and
on Learning Representations, ICLR 2021, Virtual Irwin King. 2020. Photon: A robust cross-domain
Event, Austria, May 3-7, 2021. OpenReview.net. text-to-SQL system. In Proceedings of the 58th An-
nual Meeting of the Association for Computational
Tao Yu, Michihiro Yasunaga, Kai Yang, Rui Zhang, Linguistics: System Demonstrations, pages 204–
Dongxu Wang, Zifan Li, and Dragomir Radev. 214, Online. Association for Computational Linguis-
2018b. SyntaxSQLNet: Syntax tree networks for tics.
complex and cross-domain text-to-SQL task. In Pro-
ceedings of the 2018 Conference on Empirical Meth- Rui Zhang, Tao Yu, Heyang Er, Sungrok Shim,
ods in Natural Language Processing, pages 1653– Eric Xue, Xi Victoria Lin, Tianze Shi, Caim-
1663, Brussels, Belgium. Association for Computa- ing Xiong, Richard Socher, and Dragomir Radev.
tional Linguistics. 2019. Editing-based SQL query generation for
cross-domain context-dependent questions. In Pro- A Topology for Text-to-SQL
ceedings of the 2019 Conference on Empirical Meth-
ods in Natural Language Processing and the 9th In- Figure 5 shows the topology for the text-to-SQL
ternational Joint Conference on Natural Language task.
Processing (EMNLP-IJCNLP), pages 5338–5349,
Hong Kong, China. Association for Computational
Linguistics.
B Text-to-SQL Examples
Yusen Zhang, Xiangyu Dong, Shuaichen Chang, Tao B.1 Table and Database
Yu, Peng Shi, and Rui Zhang. 2020. Did you ask a Table 6 shows an example of the table in the
good question? a cross-domain question intention
classification benchmark for text-to-SQL. ArXiv
database for Restaurants dataset. The domain for
preprint, abs/2010.12634. this dataset is restaurant information, where ques-
tions are typically about food type, restaurant loca-
Yanzhao Zheng, Haibin Wang, Baohua Dong, Xingjun tion, etc.
Wang, and Changshan Li. 2022. HIE-SQL: His-
tory information enhanced network for context- There is a big difference in terms of how many
dependent text-to-SQL semantic parsing. ArXiv tables a database has. For restaurants, there are 3
preprint, abs/2203.07376. tables in the database, while there are 32 tables in
Ruiqi Zhong, Tao Yu, and Dan Klein. 2020a. Semantic ATIS (Suhr et al., 2020).
evaluation for text-to-SQL with distilled test suites.
In Proceedings of the 2020 Conference on Empirical B.2 Domain Knowledge
Methods in Natural Language Processing (EMNLP), Question: Will undergrads be okay to take 581 ?
pages 396–411, Online. Association for Computa-
tional Linguistics. SQL query:
SELECT DISTINCT T1.ADVISORY_REQUIREMENT ,
Victor Zhong, Mike Lewis, Sida I. Wang, and Luke T1.ENFORCED_REQUIREMENT , T1.NAME FROM
Zettlemoyer. 2020b. Grounded adaptation for zero- COURSE AS T1 WHERE T1.DEPARTMENT =
shot executable semantic parsing. In Proceedings of "EECS" AND T1.NUMBER = 581 ;
the 2020 Conference on Empirical Methods in Nat-
ural Language Processing (EMNLP), pages 6869– In Advising dataset, Department “EECS” is con-
6882, Online. Association for Computational Lin- sidered as domain knowledge where “581” in the
guistics. utterance means a course in “EECS” department
Victor Zhong, Caiming Xiong, and Richard Socher. with course number “581”.
2017. Seq2SQL: Generating structured queries
from natural language using reinforcement learning. B.3 Dataset Convention
ArXiv preprint, abs/1709.00103.
Question: Give me some restaurants in alameda ?
Jiawei Zhou, Jason Eisner, Michael Newman, Em- SQL query:
manouil Antonios Platanios, and Sam Thomson. SELECT T1.HOUSE_NUMBER ,
2022. Online semantic parsing for latency reduc- T2.NAME FROM LOCATION AS T1 , RESTAURANT
tion in task-oriented dialogue. In Proceedings of the AS T2 WHERE T1.CITY_NAME = "alameda"
60th Annual Meeting of the Association for Compu- AND T2.ID = T1.RESTAURANT_ID ;
tational Linguistics (Volume 1: Long Papers), pages
1554–1576, Dublin, Ireland. Association for Compu- In Restaurants dataset, when the user queries
tational Linguistics. “restaurants”, by dataset convention, the cor-
responding SQL query returns the column
“HOUSE_NUMBER” and “NAME”.
WikiSQL
TableQA(zh); DuSQL(zh);
ViText2SQL(vi); CSpi-
der(zh); PortugueseSpider(pt)
Tree-based; Sketch-based;
Methodologies Bottom-up; Attention Mecha-
text-to-SQL Decoding
§3 nism; Copy Mechanism; Inter-
mediate Representation; Others
Table 5: Topology for text-to-SQL. Format adapted from Liu et al. (2021a).
Table 6: Geography, one of the tables in Restaurants , where they populate the slots in the templates
database. * denotes the primary key of this table. We with table and column names from the database
only include 3 rows for demonstration purpose.
schema, as well as join the corresponding tables
accordingly.
Generated question: Get all author having dataset
as DATASET_TYPE An example of the PPDB (Ganitkevitch et al.,
Generated SQL query: 2013) paraphrasing is “thrown into jail” and “im-
SELECT author.authorId prisoned”. The English portion of PPDB contains
FROM author , writes , paper , over 220 million paraphrasing pairs.
B.5 Complexity of Natural Language and C.1 More Discussion on Text-to-SQL
SQL Query Pairs Datasets
In terms of the complexity for SQL queries, CSpider (Min et al., 2019a), Vi-
Finegan-Dollak et al. (2018) find that models per- Text2SQL (Tuan Nguyen et al., 2020) and José
form better on shorter SQL queries than longer and Cozman (2021) translate all the English
SQL queries, which indicates that shorter SQL questions in Spider into Chinese, Vietnamese and
queries are easier in general. Yu et al. (2018c) Portuguese, respectively. TableQA (Sun et al.,
define the SQL hardness as the number of SQL 2020) follows the data collection method from
components. The SQL query is harder when it con- WikiSQL, while DuSQL (Wang et al., 2020c)
tains more SQL keywords such as GROUP BY and follows Spider. Both TableQA and DuSQL collect
nested subqueries. Yu et al. (2018c) gives some Chinese utterance and SQL query pairs across
examples of SQL queries with different difficulty different domains. Chen et al. (2021a) propose a
levels: Chinese domain-specific dataset, ESQL.
Easy: For multi-turn context-dependent text-to-SQL
SELECT COUNT(*) benchmarks, ATIS (Price, 1990; Dahl et al.,
FROM cars_data 1994) includes user interactions with a SQL flight
WHERE cylinders > 4 ;
database in multiple turns. Sparc (Yu et al., 2019b)
Medium: takes a further step to collect multi-turn interactions
SELECT T2.name, COUNT(*) across 200 databases and 138 domains. However,
FROM concert AS T1 JOIN stadium AS T2 ON
T1.stadium_id = T2.stadium_id GROUP
both ATIS and Sparc assume all user questions can
BY T1.stadium_id ; be mapped into SQL queries and do not include
Hard: system responses. Later, inspired by task-oriented
dialogue system (Budzianowski et al., 2018), Yu
SELECT T1.country_name
FROM countries AS T1 JOIN continents AS et al. (2019a) propose CoSQL. In CoSQL, the di-
T2 ON T1.continent = T2.cont_id JOIN alogue state is tracked by SQL. CoSQL includes
car_makers AS T3 ON T1.country_id = T3.
country
three tasks of SQL-grounded dialogue state track-
WHERE T2.continent = ’Europe’ ing to generate SQL queries from user’s utterance,
GROUP BY T1.country_name system response generation from query results, and
HAVING COUNT(*) >= 3 ;
user dialogue act prediction to detect and resolve
Extra Hard: ambiguous and unanswerable questions.
SELECT AVG(life_expectancy) FROM country Besides, TriageSQL (Zhang et al., 2020) col-
WHERE name NOT IN lects unanswerable questions other than natural
(SELECT T1.name
FROM country AS T1 JOIN utterance and SQL query pairs from Spider and
country_language AS T2 WikiSQL, bringing up the challenge of distinguish-
ON T1.code = T2.country_code
WHERE T2.language = "English"
ing answerable questions from unanswerable ones
AND T2.is_official = "T") ; in text-to-SQL systems.
In terms of the complexity of natural utterance, D Encoding and Decoding Method
there is no qualitative measure of how hard the
utterance is. Intuitively, models’ performance can Table 8 and Table 9 show the encoding and decod-
decrease when faced with longer questions from ing methods that have been discussed in § 3.2 and
users. However, the information conveyed in longer § 3.3, respectively.
sentences can be more complete, while there can
be ambiguity in shorter sentences. Besides, there E Other Related Tasks
can be domain-specific phrases that confuse the Other tasks that are related to text-to-SQL in-
model in both short and long utterances (Suhr et al., clude text-to-python (Bonthu et al., 2021), text-
2020). Thus, researchers need to consider various to-shell script/bash script (Bharadwaj and She-
perspectives to determine the complexity of natural vade, 2022), text-to-regex (Ye et al., 2020), text-to-
utterance. SPARQL (Ochieng, 2020), etc. They all take natu-
ral language queries as input and output different
C Text-to-SQL Datasets
logical forms. Among these tasks, text-to-SPARQL
Table 7 lists statistics for text-to-SQL datasets. is closest to text-to-SQL as both SPARQL and SQL
Datasets #Size #DB #D #T/DB Issues addressed Sources for data
College courses,
Domain
Spider (Yu et al., 2018c) 10,181 200 138 5.1 DabaseAnswers,
generalization
WikiSQL
Domain
Spider-DK (Gan et al., 2021b) 535 10 - 4.8 Spider dev set
knowledge
Spider + 5,330
Untranslatable
SpiderUtran (Zeng et al., 2020) 15,023 200 138 5.1 untranslatable
questions
questions
Spider-L (Lei et al., 2020) 8,034 160 - 5.1 Schema linking Spider train/dev
SpiderSL (Taniguchi et al., 2021) 1,034 10 - 4.8 Schema linking Spider dev set
Spider-Syn (Gan et al., 2021a) 8,034 160 - 5.1 Robustness Spider train/dev
WikiSQL (Zhong et al., 2017) 80,654 26,521 - 1 Data size Wikipedia
WikiTableQuestions
Lexicon-level
Squall (Shi et al., 2020b) 11,468 1,679 - 1 (Pasupat and Liang,
supervision
2015)
Domain
KaggleDBQA (Lee et al., 2021) 272 8 8 2.3 Real web daabases
generalization
ATIS (Price, 1990; Dahl et al., 1994) 5,280 1 1 32 - Flight-booking
GeoQuery (Zelle and Mooney, 1996) 877 1 1 6 - US geography
Academic
Scholar (Iyer et al., 2017) 817 1 1 7 -
publications
Microsoft Academic
Academic (Li and Jagadish, 2014) 196 1 1 15 - Search (MAS)
database
Internet Movie
IMDB (Yaghmazadeh et al., 2017) 131 1 1 16 -
Database
Yelp (Yaghmazadeh et al., 2017) 128 1 1 7 - Yelp website
University of
Advising (Finegan-Dollak et al., 2018) 3,898 1 1 10 - Michigan course
information
Restaurants (Tang and Mooney, 2000)
378 1 1 3 - Restaurants
(Popescu et al., 2003)
MIMICSQL (Wang et al., 2020d) 10,000 1 1 5 - Healthcare domain
SQL template
SEDE (Hazoom et al., 2021) 12,023 1 1 29 Stack Exchange
diversity
Table 7: Summarization for text-to-SQL datasets. #Size, #DB, #D, and #T/DB represent the number of question-
SQL pairs, databases, domains, and tables per domain, respectively. We put “-” in the #D column because we do
not know how many domains are in the Spider dev set and “-” in the Issues Addressed column because there is no
specific issue addressed for the dataset. Datasets above and below the line are cross-domain and single-domain,
respectively.
Table 9: Methods used for decoding in text-to-SQL. ♠ : Academic, Advising, ATIS, GeoQuery, Yelp, IMDB,
Scholar, Restaurants; ♥ : TableQA DuSQL, CoSQL, Sparc, Chase.