0% found this document useful (0 votes)
58 views

JSON Extension - DuckDB

JSON Extension – DuckDB

Uploaded by

gustavoleo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

JSON Extension - DuckDB

JSON Extension – DuckDB

Uploaded by

gustavoleo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Documentation / Extensions 1.

0 (stable) IN THIS
Search ctrl+k
ARTICLE

Installation
JSON Extension Ins…

Ex…
Documentation The json extension is a loadable
extension that implements SQL functions JS…
Overview
that are useful for reading values from JS…
Connect
existing JSON, and creating new JSON
JS…
Data Import data.
JS…
Client APIs
JS…
Configuration
Installing and Loading JS…
SQL
The json extension is shipped by default JS…
Extensions
in DuckDB builds, otherwise, it will be Tra…
Overview
transparently autoloaded on first use. If Se…
Core Extensions you would like to install and load it
Ind…
Community Extensions manually, run:
Eq…
Working with Extensions
INSTALL json;
Versioning of Extensions
LOAD json;
Arrow
AutoComplete

AWS Example Uses


Azure
Read a JSON file from disk, auto-infer
Delta
options:
Excel
Full Text Search SELECT * FROM 'todos.json';

httpfs (HTTP and S3)

Iceberg read_json with custom options:

ICU
SELECT *
inet
FROM read_json('todos.json',
jemalloc format = 'array',
columns = {userId: '
JSON id: 'UBIG
MySQL
title: 'V
completed

Write the result of a query to a JSON file:


Search ctrl+k

COPY (SELECT * FROM todos) TO 'todo


Installation
Documentation
See more examples of loading JSON data
Overview on the JSON data page:
Connect
Create a table with a column for storing
Data Import
JSON data:
Client APIs
Configuration CREATE TABLE example (j JSON);

SQL
Extensions Insert JSON data into the table:

Overview
INSERT INTO example VALUES
Core Extensions ('{ "family": "anatidae", "spec
Community Extensions

Working with Extensions


Retrieve the family key's value:
Versioning of Extensions

Arrow SELECT j.family FROM example;

AutoComplete

AWS
"anatidae"
Azure

Delta
Extract the family key's value with a
Excel JSONPath expression:
Full Text Search
SELECT j->'$.family' FROM example;
httpfs (HTTP and S3)

Iceberg

ICU "anatidae"

inet

jemalloc Extract the family key's value with a


JSON JSONPath expression as a VARCHAR:

MySQL
SELECT j->>'$.family' FROM example;

anatidae
Search ctrl+k

Installation
Documentation
JSON Type
Overview The json extension makes use of the
Connect JSON logical type. The JSON logical type
is interpreted as JSON, i.e., parsed, in
Data Import
JSON functions rather than interpreted as
Client APIs VARCHAR , i.e., a regular string (modulo the
Configuration equality-comparison caveat at the bottom
SQL of this page). All JSON creation functions
return values of this type.
Extensions

Overview We also allow any of DuckDB's types to be


casted to JSON, and JSON to be casted
Core Extensions
back to any of DuckDB's types, for
Community Extensions
example, to cast JSON to DuckDB's
Working with Extensions STRUCT type, run:

Versioning of Extensions
SELECT '{"duck": 42}'::JSON::STRUCT
Arrow
AutoComplete

AWS {'duck': 42}


Azure

Delta And back:


Excel
Full Text Search SELECT {duck: 42}::JSON;

httpfs (HTTP and S3)

Iceberg {"duck":42}
ICU

inet This works for our nested types as shown


jemalloc in the example, but also for non-nested
JSON
types:

MySQL
SELECT '2023-05-12'::DATE::JSON;

"2023-05-12"
Search ctrl+k

The only exception to this behavior is the


Installation
cast from VARCHAR to JSON , which does
Documentation
not alter the data, but instead parses and
Overview validates the contents of the VARCHAR as
Connect JSON.

Data Import

Client APIs JSON Table Functions


Configuration
The following table functions are used to
SQL
read JSON:
Extensions

Overview Function De

Core Extensions read_json_objects(filename) Re


Community Extensions fro
fi
Working with Extensions
list
Versioning of Extensions pa

Arrow
read_ndjson_objects(filename) Ali
AutoComplete re
AWS wit
se
Azure
'n
Delta
read_json_objects_auto(filename) Ali
Excel
re
Full Text Search wit
httpfs (HTTP and S3) se

Iceberg

ICU These functions have the following


parameters:
inet

jemalloc
JSON

MySQL
Name Description

compression The compression


the file. By default
be detected
Search ctrl+k
automatically from
file extension (e.g
Installation t.json.gz will u
Documentation t.json will use n
Options are 'non
Overview
'gzip' , 'zstd'
Connect 'auto' .

Data Import
filename Whether or not an
Client APIs filename colum
Configuration should be include
result.
SQL
Extensions format Can be one of ['
'unstructured'
Overview
'newline_delim
Core Extensions 'array'] .
Community Extensions
hive_partitioning Whether or not to
Working with Extensions
interpret the path
Versioning of Extensions Hive partitioned p

Arrow
ignore_errors Whether to ignore
AutoComplete errors (only possib
AWS when format is
'newline_delim
Azure

Delta maximum_sample_files The maximum nu


JSON files sampl
Excel
auto-detection.
Full Text Search

httpfs (HTTP and S3) maximum_object_size The maximum siz


JSON object (in b
Iceberg

ICU
The format parameter specifies how to
inet
read the JSON from a file. With
jemalloc 'unstructured' , the top-level JSON is
JSON read, e.g.:
MySQL
{
"duck": 42
}
{
"goose": [1, 2, 3]
Search ctrl+k
}

Installation
will result in two objects being read.
Documentation

Overview With 'newline_delimited' , NDJSON is


read, where each JSON is separated by a
Connect
newline ( \n ), e.g.:
Data Import

Client APIs {"duck": 42}


Configuration {"goose": [1, 2, 3]}

SQL
Extensions
will also result in two objects being read.

Overview With 'array' , each array element is read,


Core Extensions e.g.:

Community Extensions
[
Working with Extensions {
"duck": 42
Versioning of Extensions
},
Arrow {
"goose": [1, 2, 3]
AutoComplete
}
AWS ]

Azure

Delta Again, will result in two objects being


Excel read.

Full Text Search Example usage:


httpfs (HTTP and S3)
SELECT * FROM read_json_objects('my
Iceberg

ICU

inet {"duck":42,"goose":[1,2,3]}
jemalloc
JSON

MySQL
SELECT * FROM read_json_objects(['m

{"duck":42,"goose":[1,2,3]}
Search ctrl+k
{"duck":43,"goose":[4,5,6],"swan":3

Installation
Documentation SELECT * FROM read_ndjson_objects('

Overview
Connect
{"duck":42,"goose":[1,2,3]}
Data Import {"duck":43,"goose":[4,5,6],"swan":3
Client APIs
Configuration DuckDB also supports reading JSON as a
SQL table, using the following functions:
Extensions
Function Descripti
Overview

Core Extensions read_json(filename) Read JSO


filename
Community Extensions
filename
Working with Extensions list of files
pattern.
Versioning of Extensions

Arrow read_json_auto(filename) Alias for r


AutoComplete with all au
enabled.
AWS

Azure read_ndjson(filename) Alias for r


with param
Delta
set to
Excel 'newline
Full Text Search
read_ndjson_auto(filename) Alias for
httpfs (HTTP and S3)
read_jso
Iceberg paramete
to
ICU
'newline
inet

jemalloc
Besides the maximum_object_size ,
JSON format , ignore_errors and
MySQL
compression , these functions have
additional parameters:

Name Description Type

Search ctrl+k
auto_detect Whether to BOOL
auto-detect
Installation the names of
Documentation the keys and
data types of
Overview the values
Connect automatically

Data Import
columns A struct that STRUC
Client APIs specifies the
Configuration key names
and value
SQL
types
Extensions contained
within the
Overview
JSON file (e.g.,
Core Extensions {key1:
Community Extensions 'INTEGER',
key2:
Working with Extensions
'VARCHAR'} ).
Versioning of Extensions If
auto_detect
Arrow
is enabled
AutoComplete these will be
AWS inferred

Azure
dateformat Specifies the VARCH
Delta date format to
use when
Excel
parsing dates.
Full Text Search See Date
httpfs (HTTP and S3) Format

Iceberg
maximum_depth Maximum BIGIN
ICU nesting depth
inet to which the
automatic
jemalloc schema
JSON detection
detects types.
MySQL
Name Description Type

Set to -1 to
fully detect
nested JSON
Search ctrl+k types

Installation records Can be one of VARCH


['auto',
Documentation
'true',
Overview 'false']

Connect
sample_size Option to UBIGI
Data Import define number
Client APIs of sample
objects for
Configuration
automatic
SQL JSON type
detection. Set
Extensions
to -1 to scan
Overview the entire
Core Extensions input file

Community Extensions timestampformat Specifies the VARCH


Working with Extensions date format to
use when
Versioning of Extensions
parsing
Arrow timestamps.
AutoComplete See Date
Format
AWS

Azure union_by_name Whether the BOOL


schema's of
Delta
multiple JSON
Excel files should be
Full Text Search unified

httpfs (HTTP and S3)

Iceberg Example usage:

ICU
SELECT * FROM read_json('my_file1.j
inet

jemalloc
JSON

MySQL
duck

42

Search ctrl+k DuckDB can convert JSON arrays directly


to its internal LIST type, and missing keys
Installation become NULL :
Documentation
SELECT *
Overview FROM read_json(
Connect ['my_file1.json', 'my_file2
columns = {duck: 'INTEGER',
Data Import );
Client APIs
Configuration
duck goose swan
SQL
Extensions 42 [1, 2, 3] NULL

Overview
43 [4, 5, 6] 3.3
Core Extensions

Community Extensions DuckDB can automatically detect the


Working with Extensions types like so:
Versioning of Extensions
SELECT goose, duck FROM read_json('
Arrow
SELECT goose, duck FROM '*.json.gz'
AutoComplete

AWS

Azure goose duck

Delta [1, 2, 3] 42
Excel
[4, 5, 6] 43
Full Text Search

httpfs (HTTP and S3)


DuckDB can read (and auto-detect) a
Iceberg variety of formats, specified with the
ICU format parameter. Querying a JSON file

inet that contains an 'array' , e.g.:

jemalloc
[
JSON {
MySQL "duck": 42,
"goose": 4.2
},
{
"duck": 43,
"goose": 4.3
Search ctrl+k }
]

Installation
Documentation Can be queried exactly the same as a
JSON file that contains 'unstructured'
Overview
JSON, e.g.:
Connect

Data Import {
"duck": 42,
Client APIs
"goose": 4.2
Configuration }
{
SQL
"duck": 43,
Extensions "goose": 4.3
}
Overview

Core Extensions
Both can be read as the table:
Community Extensions

Working with Extensions duck goose

Versioning of Extensions
42 4.2
Arrow
AutoComplete 43 4.3

AWS
If your JSON file does not contain
Azure
'records', i.e., any other type of JSON than
Delta objects, DuckDB can still read it. This is
Excel specified with the records parameter.
Full Text Search
The records parameter specifies
whether the JSON contains records that
httpfs (HTTP and S3)
should be unpacked into individual
Iceberg columns, i.e., reading the following file
ICU with records :
inet
{"duck": 42, "goose": [1, 2, 3]}
jemalloc
{"duck": 43, "goose": [4, 5, 6]}
JSON

MySQL
Results in two columns:

duck goose

42 [1,2,3]
Search ctrl+k

42 [4,5,6]
Installation
Documentation You can read the same file with records
Overview set to 'false' , to get a single column,
which is a STRUCT containing the data:
Connect

Data Import json


Client APIs
{'duck': 42, 'goose': [1,2,3]}
Configuration

SQL {'duck': 43, 'goose': [4,5,6]}

Extensions
For additional examples reading more
Overview
complex data, please see the Shredding
Core Extensions
Deeply Nested JSON, One Vector at a
Community Extensions Time blog post.
Working with Extensions

Versioning of Extensions
JSON Import/Export
Arrow
AutoComplete When the json extension is installed,
FORMAT JSON is supported for COPY FROM ,
AWS
COPY TO , EXPORT DATABASE and IMPORT
Azure DATABASE . See Copy and Import/Export.
Delta
By default, COPY expects newline-
Excel
delimited JSON. If you prefer copying
Full Text Search data to/from a JSON array, you can
httpfs (HTTP and S3) specify ARRAY true , e.g.,

Iceberg
COPY (SELECT * FROM range(5)) TO 'm
ICU

inet
will create the following file:
jemalloc
JSON

MySQL
[
{"range":0},
{"range":1},
{"range":2},
{"range":3},
Search ctrl+k
{"range":4}
]
Installation
Documentation
This can be read like so:
Overview
Connect CREATE TABLE test (range BIGINT);
COPY test FROM 'my.json' (ARRAY tru
Data Import

Client APIs
The format can be detected automatically
Configuration
the format like so:
SQL
Extensions COPY test FROM 'my.json' (AUTO_DETE

Overview

Core Extensions

Community Extensions JSON Creation


Working with Extensions Functions
Versioning of Extensions
The following functions are used to create
Arrow JSON.
AutoComplete
Function Description
AWS

Azure to_json(any) Create JSON


from a value
Delta
of any type.
Excel Our LIST is
Full Text Search converted to
a JSON array,
httpfs (HTTP and S3)
and our
Iceberg STRUCT and

ICU MAP are


converted to
inet
a JSON
jemalloc object.

JSON

MySQL
Function Description

json_quote(any) Alias for


to_json .

Search ctrl+k
array_to_json(list) Alias for
to_json
Installation that only
Documentation accepts
LIST .
Overview
Connect row_to_json(list) Alias for
to_json
Data Import
that only
Client APIs accepts
Configuration STRUCT .

SQL
json_array([any, Create a
Extensions ...]) JSON array
from any
Overview
number of
Core Extensions values.
Community Extensions
json_object([key, Create a
Working with Extensions
value, ...]) JSON object
Versioning of Extensions from any
number of
Arrow
key , value
AutoComplete pairs.
AWS
json_merge_patch(json, Merge two
Azure
json) JSON
Delta documents

Excel together.

Full Text Search


Examples:
httpfs (HTTP and S3)

Iceberg SELECT to_json('duck');


ICU

inet
"duck"
jemalloc
JSON

MySQL
SELECT to_json([1, 2, 3]);

[1,2,3]
Search ctrl+k

Installation SELECT to_json({duck : 42});

Documentation

Overview
{"duck":42}
Connect

Data Import
SELECT to_json(map(['duck'],[42]));
Client APIs
Configuration

SQL {"duck":42}
Extensions

Overview
SELECT json_array(42, 'duck', NULL)
Core Extensions

Community Extensions

Working with Extensions [42,"duck",null]

Versioning of Extensions

Arrow SELECT json_object('duck', 42);


AutoComplete

AWS
{"duck":42}
Azure

Delta

Excel SELECT json_merge_patch('{"duck": 4

Full Text Search

httpfs (HTTP and S3)


{"goose":123,"duck":42}
Iceberg

ICU

inet JSON Extraction


jemalloc Functions
JSON

MySQL
There are two extraction functions, which
have their respective operators. The
operators can only be used if the string is
stored as the JSON logical type. These
Search ctrl+k functions supports the same two location
notations as the previous functions.
Installation
Function Alias
Documentation

Overview json_extract(json, path) json_extr

Connect

Data Import

Client APIs
Configuration

SQL
Extensions

Overview json_extract_string(json, json_extr


path)
Core Extensions

Community Extensions

Working with Extensions

Versioning of Extensions

Arrow
AutoComplete

AWS

Azure
Note that the equality comparison
Delta operator ( = ) has a higher precedence
Excel than the -> JSON extract operator.
Full Text Search Therefore, surround the uses of the ->
operator with parentheses when making
httpfs (HTTP and S3)
equality comparisons. For example:
Iceberg

ICU SELECT ((JSON '{"field": 42}')->'fi


inet

jemalloc
Warning
JSON

MySQL
DuckDB's JSON data type uses 0-
based indexing.

Examples:
Search ctrl+k

CREATE TABLE example (j JSON);


Installation INSERT INTO example VALUES
('{ "family": "anatidae", "spec
Documentation

Overview
Connect SELECT json_extract(j, '$.family')
Data Import

Client APIs
"anatidae"
Configuration

SQL
Extensions SELECT j->'$.family' FROM example;

Overview

Core Extensions "anatidae"


Community Extensions

Working with Extensions


SELECT j->'$.species[0]' FROM examp
Versioning of Extensions

Arrow
AutoComplete "duck"

AWS

Azure
SELECT j->'$.species[*]' FROM examp
Delta

Excel
Full Text Search ["duck", "goose", "swan", null]

httpfs (HTTP and S3)

Iceberg SELECT j->>'$.species[*]' FROM exam


ICU

inet
[duck, goose, swan, null]
jemalloc
JSON

MySQL
SELECT j->'$.species'->0 FROM examp

"duck"
Search ctrl+k

Installation SELECT j->'species'->['0','1'] FROM


Documentation

Overview
["duck", "goose"]
Connect

Data Import

Client APIs SELECT json_extract_string(j, '$.fa

Configuration

SQL
anatidae
Extensions

Overview
SELECT j->>'$.family' FROM example;
Core Extensions

Community Extensions

Working with Extensions anatidae


Versioning of Extensions

Arrow
SELECT j->>'$.species[0]' FROM exam
AutoComplete

AWS

Azure duck

Delta

Excel
SELECT j->'species'->>0 FROM exampl
Full Text Search

httpfs (HTTP and S3)

Iceberg duck

ICU

inet SELECT j->'species'->>['0','1'] FRO


jemalloc
JSON

MySQL
[duck, goose]

Note that DuckDB's JSON data type uses


Search ctrl+k 0-based indexing.

If multiple values need to be extracted


Installation
from the same JSON, it is more efficient
Documentation to extract a list of paths:
Overview
The following will cause the JSON to be
Connect parsed twice,:
Data Import
Resulting in a slower query that uses
Client APIs more memory:
Configuration

SQL SELECT
json_extract(j, 'family') AS fa
Extensions json_extract(j, 'species') AS s
FROM example;
Overview

Core Extensions

Community Extensions
family species
Working with Extensions
"anatidae" ["duck","goose","swan",n
Versioning of Extensions

Arrow
The following produces the same result
AutoComplete
but is faster and more memory-efficient:
AWS

Azure WITH extracted AS (


SELECT json_extract(j, ['family
Delta
FROM example
Excel )
SELECT
Full Text Search
extracted_list[1] AS family,
httpfs (HTTP and S3) extracted_list[2] AS species
FROM extracted;
Iceberg

ICU

inet

jemalloc
JSON Scalar
Functions
JSON

MySQL
The following scalar JSON functions can
be used to gain information about the
stored JSON values. With the exception
of json_valid(json) , all JSON functions
Search ctrl+k produce an error when invalid JSON is
supplied.
Installation
We support two kinds of notations to
Documentation describe locations within JSON: JSON
Overview Pointer and JSONPath.

Connect
Function Descrip
Data Import
json_array_length(json[, Return t
Client APIs
path]) number
Configuration element
SQL JSON a
json , o
Extensions
not a JS
Overview If path
specifie
Core Extensions
the num
Community Extensions element
Working with Extensions JSON a
given pa
Versioning of Extensions
path is
Arrow the resu
AutoComplete LIST o
lengths.
AWS

Azure json_contains(json_haystack, Returns


json_needle) json_n
Delta
containe
Excel json_h
Full Text Search Both pa
are of JS
httpfs (HTTP and S3)
but
Iceberg json_n
can also
ICU
numeric
inet a string,
jemalloc the strin
be wrap
JSON
double q
MySQL
Function Descrip

json_keys(json[, path]) Returns


of json
LIST o
Search ctrl+k
VARCHA
json is
Installation object. I
Documentation specifie
the keys
Overview
JSON o
Connect the give
If path
Data Import
LIST , t
Client APIs will be L
Configuration LIST o
VARCHA
SQL
Extensions json_structure(json) Return t
structur
Overview
json . D
Core Extensions to JSON
Community Extensions structur
inconsis
Working with Extensions
incompa
Versioning of Extensions types in

Arrow
AutoComplete

AWS

Azure

Delta

Excel
Full Text Search

httpfs (HTTP and S3)

Iceberg

ICU

inet

jemalloc
JSON

MySQL
Function Descrip

json_type(json[, path]) Return t


of the su
json , w
Search ctrl+k
one of A
BIGINT
Installation BOOLEA
Documentation DOUBLE
OBJECT
Overview
UBIGIN
Connect VARCHA
NULL . If
Data Import
specifie
Client APIs the type
Configuration element
given pa
SQL
path is
Extensions the resu
LIST o
Overview

Core Extensions json_valid(json) Return w


Community Extensions json is
JSON.
Working with Extensions

Versioning of Extensions json(json) Parse an


json .
Arrow
AutoComplete
The JSONPointer syntax separates each
AWS
field with a / . For example, to extract the
Azure
first element of the array with key "duck" ,
Delta you can do:
Excel
Full Text Search SELECT json_extract('{"duck": [1, 2

httpfs (HTTP and S3)

Iceberg
1
ICU

inet The JSONPath syntax separates fields


jemalloc with a . , and accesses array elements
JSON with [i] , and always starts with $ . Using

MySQL
the same example, we can do the
following:

SELECT json_extract('{"duck": [1, 2


Search ctrl+k

Installation 1

Documentation

Overview Note that DuckDB's JSON data type uses


0-based indexing.
Connect

Data Import JSONPath is more expressive, and can


Client APIs also access from the back of lists:

Configuration
SELECT json_extract('{"duck": [1, 2
SQL
Extensions

Overview 3

Core Extensions

Community Extensions JSONPath also allows escaping syntax


tokens, using double quotes:
Working with Extensions

Versioning of Extensions
SELECT json_extract('{"duck.goose":
Arrow
AutoComplete

AWS 2

Azure

Delta
Examples using the anatidae biological
family:
Excel
Full Text Search CREATE TABLE example (j JSON);
httpfs (HTTP and S3) INSERT INTO example VALUES
('{ "family": "anatidae", "spec
Iceberg

ICU

inet SELECT json(j) FROM example;

jemalloc
JSON

MySQL
{"family":"anatidae","species":["du

SELECT j.family FROM example;


Search ctrl+k

Installation "anatidae"
Documentation

Overview
SELECT j.species[0] FROM example;
Connect

Data Import
"duck"
Client APIs
Configuration

SQL SELECT json_valid(j) FROM example;


Extensions

Overview
true
Core Extensions

Community Extensions
SELECT json_valid('{');
Working with Extensions

Versioning of Extensions

Arrow false

AutoComplete

AWS
SELECT json_array_length('["duck",
Azure

Delta

Excel 4

Full Text Search

httpfs (HTTP and S3) SELECT json_array_length(j, 'specie


Iceberg

ICU
4
inet

jemalloc
JSON SELECT json_array_length(j, '/speci

MySQL
4

SELECT json_array_length(j, '$.spec


Search ctrl+k

Installation 4
Documentation

Overview
SELECT json_array_length(j, ['$.spe
Connect

Data Import

Client APIs [4]

Configuration

SQL
SELECT json_type(j) FROM example;
Extensions

Overview
OBJECT
Core Extensions

Community Extensions

Working with Extensions SELECT json_keys(j) FROM example;

Versioning of Extensions

Arrow [family, species]


AutoComplete

AWS
SELECT json_structure(j) FROM examp
Azure

Delta

Excel {"family":"VARCHAR","species":["VAR

Full Text Search

httpfs (HTTP and S3)


SELECT json_structure('["duck", {"f
Iceberg

ICU

inet ["JSON"]

jemalloc
JSON

MySQL
SELECT json_contains('{"key": "valu

true
Search ctrl+k

Installation SELECT json_contains('{"key": 1}',


Documentation

Overview
true
Connect

Data Import

Client APIs SELECT json_contains('{"top_key": {

Configuration

SQL
true
Extensions

Overview

Core Extensions JSON Aggregate


Community Extensions Functions
Working with Extensions
There are three JSON aggregate
Versioning of Extensions
functions.
Arrow
AutoComplete Function Descripti

AWS json_group_array(any) Return a J


Azure array with
values of
Delta
the aggreg
Excel
Full Text Search json_group_object(key, Return a J
value) object wit
httpfs (HTTP and S3) key , val
Iceberg in the
aggregatio
ICU

inet json_group_structure(json) Return the


jemalloc combined
json_str
JSON

MySQL
Function Descripti

of all json
aggregatio

Search ctrl+k
Examples:

Installation
CREATE TABLE example1 (k VARCHAR, v
Documentation INSERT INTO example1 VALUES ('duck'
Overview
Connect
SELECT json_group_array(v) FROM exa
Data Import

Client APIs
Configuration [42, 7]
SQL
Extensions
SELECT json_group_object(k, v) FROM
Overview

Core Extensions

Community Extensions {"duck":42,"goose":7}

Working with Extensions

Versioning of Extensions CREATE TABLE example2 (j JSON);


Arrow INSERT INTO example2 VALUES
('{"family": "anatidae", "speci
AutoComplete ('{"family": "canidae", "specie
AWS

Azure
SELECT json_group_structure(j) FROM
Delta

Excel
Full Text Search {"family":"VARCHAR","species":["VAR
httpfs (HTTP and S3)

Iceberg

ICU Transforming JSON


inet
In many cases, it is inefficient to extract
jemalloc
values from JSON one-by-one. Instead,
JSON we can "extract" all values at once,
MySQL
transforming JSON to the nested types
LIST and STRUCT .

Function Descript

Search ctrl+k
json_transform(json, Transform
structure) to the spe
Installation structu
Documentation
from_json(json, structure) Alias for
Overview
Connect json_transform_strict(json, Same as
structure) json_tr
Data Import
throws an
Client APIs casting fa
Configuration
from_json_strict(json, Alias for
SQL structure) json_tr
Extensions

Overview The structure argument is JSON of the


Core Extensions same form as returned by
json_structure . The structure
Community Extensions
argument can be modified to transform
Working with Extensions
the JSON into the desired structure and
Versioning of Extensions types. It is possible to extract fewer
Arrow key/value pairs than are present in the
JSON, and it is also possible to extract
AutoComplete
more: missing keys become NULL .
AWS

Azure Examples:

Delta
CREATE TABLE example (j JSON);
Excel INSERT INTO example VALUES
Full Text Search ('{"family": "anatidae", "speci
('{"family": "canidae", "specie
httpfs (HTTP and S3)

Iceberg

ICU SELECT json_transform(j, '{"family"

inet

jemalloc
{'family': anatidae, 'coolness': 42
JSON {'family': canidae, 'coolness': NUL

MySQL
SELECT json_transform(j, '{"family"

{'family': NULL, 'coolness': 42.42}


Search ctrl+k
{'family': NULL, 'coolness': NULL}

Installation
Documentation SELECT json_transform_strict(j, '{"

Overview
Connect
Invalid Input Error: Failed to cast
Data Import

Client APIs
Configuration
Serializing and
SQL
Deserializing SQL to
Extensions
JSON and Vice Versa
Overview
The json extension also provides
Core Extensions
functions to serialize and deserialize
Community Extensions SELECT statements between SQL and
Working with Extensions JSON, as well as executing JSON
Versioning of Extensions serialized statements.

Arrow
Function
AutoComplete
json_deserialize_sql(json)
AWS

Azure

Delta
json_execute_serialized_sql(varchar)
Excel
Full Text Search

httpfs (HTTP and S3)

Iceberg json_serialize_sql(varchar,
ICU skip_empty := boolean, skip_null :=
boolean, format := boolean)
inet

jemalloc
PRAGMA
JSON
json_execute_serialized_sql(varchar)
MySQL
Function

Search ctrl+k The json_serialize_sql(varchar)


function takes three optional parameters,
Installation skip_empty , skip_null , and format that
can be used to control the output of the
Documentation
serialized statements.
Overview
If you run the
Connect
json_execute_serialize_sql(varchar)
Data Import table function inside of a transaction the
Client APIs serialized statements will not be able to
Configuration see any transaction local changes. This is
because the statements are executed in
SQL
a separate query context. You can use the
Extensions PRAGMA
Overview json_execute_serialize_sql(varchar)

Core Extensions pragma version to execute the


statements in the same query context as
Community Extensions
the pragma, although with the limitation
Working with Extensions that the serialized JSON must be
Versioning of Extensions provided as a constant string, i.e., you
Arrow cannot do PRAGMA
json_execute_serialize_sql(json_serialize_sql(...)) .
AutoComplete

AWS Note that these functions do not preserve


syntactic sugar such as FROM * SELECT
Azure
... , so a statement round-tripped
Delta
through
Excel json_deserialize_sql(json_serialize_sql(...))

Full Text Search may not be identical to the original


statement, but should always be
httpfs (HTTP and S3)
semantically equivalent and produce the
Iceberg
same output.
ICU
Examples:
inet

jemalloc Simple example:

JSON

MySQL
SELECT json_serialize_sql('SELECT 2

'{"error":false,"statements":[{"nod
Search ctrl+k

Installation Example with multiple statements and


Documentation skip options:

Overview
SELECT json_serialize_sql('SELECT 1
Connect

Data Import

Client APIs '{"error":false,"statements":[{"nod

Configuration

SQL Example with a syntax error:


Extensions
SELECT json_serialize_sql('TOTALLY
Overview

Core Extensions

Community Extensions '{"error":true,"error_type":"parser


Working with Extensions

Versioning of Extensions Example with deserialize:


Arrow
AutoComplete SELECT json_deserialize_sql(json_se

AWS

Azure
'SELECT (1 + 2)'
Delta

Excel
Example with deserialize and syntax
Full Text Search sugar:
httpfs (HTTP and S3)

Iceberg SELECT json_deserialize_sql(json_se

ICU

inet
'SELECT (1 + 2) FROM x'
jemalloc
JSON Example with execute:
MySQL
SELECT * FROM json_execute_serializ

3
Search ctrl+k

Installation Example with error:

Documentation
SELECT * FROM json_execute_serializ
Overview
Connect

Data Import Error: Parser Error: Error parsing

Client APIs
Configuration

SQL Indexing
Extensions
Warning
Overview

Core Extensions Following PostgreSQL's


conventions, DuckDB uses 1-
Community Extensions based indexing for arrays and lists
Working with Extensions but 0-based indexing for the
JSON data type.
Versioning of Extensions

Arrow
AutoComplete

AWS
Equality Comparison
Azure
Warning
Delta
Currently, equality comparison of
Excel JSON files can differ based on the
Full Text Search context. In some cases, it is based
on raw text comparison, while in
httpfs (HTTP and S3)
other cases, it uses logical content
Iceberg comparison.

ICU

inet The following query returns true for all


jemalloc fields:

JSON

MySQL
SELECT
a != b, -- Space is part of phy
c != d, -- Same.
c[0] = d[0], -- Equality becaus
a = c[0], -- Indeed, field is e
Search ctrl+k
b != c[0], -- ... but different
FROM (
Installation SELECT
'[]'::JSON AS a,
Documentation
'[ ]'::JSON AS b,
Overview '[[]]'::JSON AS c,
'[[ ]]'::JSON AS d
Connect );
Data Import

Client APIs
(a != (c != (c[0] = (a = (b !=
Configuration
b) d) d[0]) c[0]) c[0])
SQL
true true true true true
Extensions

Overview

Core Extensions
About this page Last
Community Extensions
modified:
• Visit the related directory in
Working with Extensions 2024-08-02
the main DuckDB GitHub
Versioning of Extensions repository
• Report content issue
Arrow
• Edit this page on GitHub
AutoComplete

AWS

Azure

Delta

Excel
Full Text Search

httpfs (HTTP and S3)

Iceberg

ICU

inet

jemalloc
JSON

MySQL

You might also like