Skip to content

Commit 2cc4c0b

Browse files
tswastdandhlee
andauthored
doc: share design document for query retry logic (#1123)
* doc: share design document for query retry logic * add design document to contents tree * clarify a few points * Update docs/design/query-retries.md Co-authored-by: Dan Lee <71398022+dandhlee@users.noreply.github.com> Co-authored-by: Dan Lee <71398022+dandhlee@users.noreply.github.com>
1 parent e760d1b commit 2cc4c0b

File tree

3 files changed

+127
-0
lines changed

3 files changed

+127
-0
lines changed

docs/design/index.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
Client Library Design
2+
=====================
3+
4+
Some features of this client library have complex requirements and/or
5+
implementation. These documents describe the design decisions that contributued
6+
to those features.
7+
8+
.. toctree::
9+
:maxdepth: 2
10+
11+
query-retries

docs/design/query-retries.md

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# Design of query retries in the BigQuery client libraries for Python
2+
3+
4+
## Overview
5+
6+
The BigQuery client libraries for Python must safely retry API requests related to initiating a query. By "safely", it is meant that the BigQuery backend never successfully executes the query twice. This avoids duplicated rows from INSERT DML queries, among other problems.
7+
8+
To achieve this goal, the client library only retries an API request relating to queries if at least one of the following is true: (1) issuing this exact request is idempotent, meaning that it won't result in a duplicate query being issued, or (2) the query has already failed in such a way that it is safe to re-issue the query.
9+
10+
11+
## Background
12+
13+
14+
### API-level retries
15+
16+
Retries for nearly all API requests were [added in 2017](https://github.com/googleapis/google-cloud-python/pull/4148) and are [configurable via a Retry object](https://googleapis.dev/python/google-api-core/latest/retry.html#google.api_core.retry.Retry) passed to the retry argument. Notably, this includes the "query" method on the Python client, corresponding to the [jobs.insert REST API method](https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/insert). The Python client always populates the [jobReference.jobId](https://cloud.google.com/bigquery/docs/reference/rest/v2/JobReference#FIELDS.job_id) field of the request body. If the BigQuery REST API receives a jobs.insert request for a job with the same ID, the REST API fails because the job already exists.
17+
18+
19+
### jobs.insert and jobs.query API requests
20+
21+
By default, the Python client starts a query using the [jobs.insert REST API
22+
method](https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/insert).
23+
Support for the [jobs.query REST API
24+
method](https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query)
25+
was [added via the `api_method`
26+
parameter](https://github.com/googleapis/python-bigquery/pull/967) and is
27+
included in version 3.0 of the Python client library.
28+
29+
The jobs.query REST API method differs from jobs.insert in that it does not accept a job ID. Instead, the [requestId parameter](https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query#QueryRequest.FIELDS.request_id) provides a window of idempotency for duplicate requests.
30+
31+
32+
### Re-issuing a query
33+
34+
The ability to re-issue a query automatically was a [long](https://github.com/googleapis/google-cloud-python/issues/5555) [requested](https://github.com/googleapis/python-bigquery/issues/14) [feature](https://github.com/googleapis/python-bigquery/issues/539). As work ramped up on the SQLAlchemy connector, it became clear that this feature was necessary to keep the test suite, which issues hundreds of queries, from being [too flakey](https://github.com/googleapis/python-bigquery-sqlalchemy/issues?q=is%3Aissue+is%3Aclosed+author%3Aapp%2Fflaky-bot+sort%3Acreated-asc).
35+
36+
Retrying a query is not as simple as retrying a single API request. In many
37+
cases the client library does not "know" about a query job failure until it
38+
tries to fetch the query results. To solve this, the [client re-issues a
39+
query](https://github.com/googleapis/python-bigquery/pull/837) as it was
40+
originally issued only if the query job has failed for a retryable reason.
41+
42+
43+
### getQueryResults error behavior
44+
45+
The client library uses [the jobs.getQueryResults REST API method](https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/getQueryResults) to wait for a query to finish. This REST API has a unique behavior in that it translates query job failures into HTTP error status codes. To disambiguate these error responses from one that may have occurred further up the REST API stack (such as from the Google load balancer), the client library inspects the error response body.
46+
47+
When the error corresponds to a query job failure, BigQuery populates the
48+
"errors" array field, with the first element in the list corresponding to the
49+
error which directly caused the job failure. There are many [error response
50+
messages](https://cloud.google.com/bigquery/docs/error-messages), but only some
51+
of them indicate that re-issuing the query job may help. For example, if the
52+
job fails due to invalid query syntax, re-issuing the query won't help. If a
53+
query job fails due to "backendError" or "rateLimitExceeded", we know that the
54+
job did not successfully execute for some other reason.
55+
56+
57+
## Detailed design
58+
59+
As mentioned in the "Overview" section, the Python client only retries a query request if at least one of the following is true: (1) issuing this exact request is idempotent, meaning that it won't result in a duplicate query being issued, or (2) the query has already failed in such a way that it is safe to re-issue the query.
60+
61+
A developer can configure when to retry an API request (corresponding to #1 "issuing this exact request is idempotent") via the query method's `retry` parameter. A developer can configure when to re-issue a query job after a job failure (corresponding to #2 "the query has already failed") via the query method's `job_retry` parameter.
62+
63+
64+
### Retrying API requests via the `retry` parameter
65+
66+
The first set of retries are at the API layer. The client library sends an
67+
identical request if the request is idempotent.
68+
69+
#### Retrying the jobs.insert API via the retry parameter
70+
71+
When the `api_method` parameter is set to `"INSERT"`, which is the default
72+
value, the client library uses the jobs.insert REST API to start a query job.
73+
Before it issues this request, it sets a job ID. This job ID remains constant
74+
across API retries.
75+
76+
If the job ID was randomly generated, and the jobs.insert request and all retries fail, the client library sends a request to the jobs.get API. This covers the case when a query request succeeded, but there was a transient issue that prevented the client from receiving a successful response.
77+
78+
79+
#### Retrying the jobs.query API via the retry parameter
80+
81+
When the `api_method` parameter is set to `"QUERY"` (available in version 3 of
82+
the client library), the client library sends a request to the jobs.query REST
83+
API. The client library automatically populates the `requestId` parameter in
84+
the request body. The `requestId` remains constant across API retries, ensuring
85+
that requests are idempotent.
86+
87+
As there is no job ID available, the client library cannot call jobs.get if the query happened to succeed, but all retries resulted in an error response. In this case, the client library throws an exception.
88+
89+
90+
#### Retrying the jobs.getQueryResults API via the retry parameter
91+
92+
The jobs.getQueryResults REST API is read-only. Thus, it is always safe to
93+
retry. As noted in the "Background" section, HTTP error response codes can
94+
indicate that the job itself has failed, so this may retry more often than is
95+
strictly needed
96+
([Issue #1122](https://github.com/googleapis/python-bigquery/issues/1122)
97+
has been opened to investigate this).
98+
99+
100+
### Re-issuing queries via the `job_retry` parameter
101+
102+
The first set of retries are at the "job" layer, called "re-issue" in this
103+
document. The client library sends an identical query request (except for the
104+
job or request identifier) if the query job has failed for a re-issuable reason.
105+
106+
107+
#### Deciding when it is safe to re-issue a query
108+
109+
The conditions when it is safe to re-issue a query are different from the conditions when it is safe to retry an individual API request. As such, the `job_retry` parameter is provided to configure this behavior.
110+
111+
The `job_retry` parameter is only used if (1) a query job fails and (2) a job ID is not provided by the developer. This is because it must generate a new job ID (or request ID, depending on the method used to create the query job) to avoid getting the same failed job.
112+
113+
The `job_retry` parameter logic only happens after the client makes a request to the `jobs.getQueryRequest` REST API, which fails. The client examines the exception to determine if this failure was caused by a failed job and that the failure reason (e.g. "backendError" or "rateLimitExceeded") indicates that re-issuing the query may help.
114+
115+
If it is determined that the query job can be re-issued safely, the original logic to issue the query is executed. If the jobs.insert REST API was originally used, a new job ID is generated. Otherwise, if the jobs.query REST API was originally used, a new request ID is generated. All other parts of the request body remain identical to the original request body for the failed query job, and the process repeats until `job_retry` is exhausted.

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ API Reference
2626

2727
reference
2828
dbapi
29+
design/index
2930

3031
Migration Guide
3132
---------------

0 commit comments

Comments
 (0)