0% found this document useful (0 votes)
78 views

Making Sense of Schema-on-Read: Modeling JSON

Uploaded by

imanon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views

Making Sense of Schema-on-Read: Modeling JSON

Uploaded by

imanon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Making Sense of

Schema-on-Read
Modeling JSON

KENT GRAZIANO, CHIEF TECHNICAL EVANGELIST I NOVEMBER 2018

KentGraziano
About me
• Chief Technical Evangelist, Snowflake Computing
• Oracle ACE Director, Alumni (DW/BI)
• OakTable Network
• Blogger – The Data Warrior
• Certified Data Vault Master and DV 2.0 Practitioner
• Former Member: Boulder BI Brain Trust (#BBBT)
• Member: DAMA Houston & DAMA International
• Data Architecture and Data Warehouse Specialist
• 30+ years in IT
• 25+ years of Oracle-related work
• 20+ years of data warehousing experience
• Author & Co-Author of a bunch of books (Amazon)
• Past-President of ODTUG and Rocky Mountain Oracle
User Group

© 2018 Snowflake Computing Inc. All Rights Reserved


3 years in stealth + 3 years GA

Founded 2012 by
industry veterans First customers 2014,
with over 120 general availability
database patents 2015

Over $850M in venture


funding from leading 700+ employees
investors Over 2000
customers today

Queries processed in Snowflake per day: 60 Million


Largest single table: 68 Trillion Rows
Largest number of tables single DB: 200,000
Single customer most data: > 40 PB
Single customer most users: > 10,000
© 2018 Snowflake Computing Inc. All Rights Reserved
AGENDA

• Schema-on-Read vs Schema-on-Write
• Why we still need data modeling
• What is JSON?
• Example JSON #1
• Simple 3NF model
• Simple Data Vault model
• Example JSON #2
• 3NF model
• Data Vault model

© 2018 Snowflake Computing Inc. All Rights Reserved


Defining Terms

• Schema-on-Read
• Popularized in document stores and NoSQL dbs
• No upfront modeling
• No predefined structure
• Called semi-structured or flexible-structure data
• Can change contents and structure over time
• Load & Go
• Agile!

© 2018 Snowflake Computing Inc. All Rights Reserved


Defining Terms

• Schema-on-Write
• What we do in RDBMS today
• Requires knowing the structure in advance
• Upfront modeling & table design required
• Must map source data to the database tables
• ETL/ELT may break if the source data changes

© 2018 Snowflake Computing Inc. All Rights Reserved


40 Zettabytes by 2020

Web 3rd party apps Mobile Enterprise apps ERP IoT

© 2018 Snowflake Computing Inc. All Rights Reserved


It’s not the data itself

Web 3rd party apps Mobile Enterprise apps ERP IoT

it’s how you take full advantage of the insight it provides

© 2018 Snowflake Computing Inc. All Rights Reserved


Who needs data modeling anyway?

• We all do!
• To take advantage of all this data, we have to use it
• Schema-on-Read
• There is a SCHEMA – which means a model!
• To query the data requires knowing the structure
• Which means the MODEL of the data or “document”
• Few reporting or BI tools can infer the schema
• So we have to transform it, somehow
• Load to tables and columns?
• Expose with a SQL view?
© 2018 Snowflake Computing Inc. All Rights Reserved
What is JSON?

• Java A minimal, readable format


• Script for structuring data.
• Object
It is used primarily to
• Notation
transmit data between a
server and a web
application, as an
alternative to XML

© 2018 Snowflake Computing Inc. All Rights Reserved


Why worry about JSON?

• There is LOTS of it out there


• JavaScript is popular
• REST API’s for IoT & Mobile
• Application and web logs – Social Media
• Self-describing so very portable
• Open datasets published in JSON
• Data.gov
• Datasf.org
• Data.cityofNewYork.us
• Opportunity for analysis!

© 2018 Snowflake Computing Inc. All Rights Reserved


JSON Support with SQL
Structured data Semi-structured data
(e.g. JSON, Avro, XML)
{ "firstName": "John",
"lastName": "Smith",
"height_cm": 167.64,
Apple 101.12 250 FIH-2316
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021-3100"
Pear 56.22 202 IHO-6912 },
"phoneNumbers": [
{ "type": "home", "number": "212 555-1234" },
{ "type": "office", "number": "646 555-4567" }
]
Orange 98.21 600 WHQ-6090 }

All Your Data!


select v:lastName::string as last_name
from json_demo;

© 2018 Snowflake Computing Inc. All Rights Reserved


JSON Example #1
{
"colors": [
{ Key : Value
"color": "black",
This is a JSON Document
"category": "hue",
Enclosed by { }
"type": "primary",
"code": {"rgba": [255,255,255,1],
"hex": "#000” Elements are Key-Value Pairs
}
},
{
Elements may have nested Keys
"color": "green",
Delineated by more { }
"category": "hue",
"type": "secondary”,
"code": { "rgba": [0,255,0,1],
"hex": "#0F0" Some Values may be Arrays
} Delineated by [ ]
},
]
}

© 2018 Snowflake Computing Inc. All Rights Reserved


JSON as 3NF –
Logical Model
"colors": [
{
"color": "black",
"category": "hue",
"type": "primary",
"code":
{
"rgba": [255,255,255,1],
"hex": "#000”
}
}
]

© 2018 Snowflake Computing Inc. All Rights Reserved


JSON as 3NF –
Logical Model
"colors": [
{
"color": "black",
"category": "hue",
"type": "primary",
"code":
{
"rgba": [255,255,255,1],
"hex": "#000”
}
}
]

© 2018 Snowflake Computing Inc. All Rights Reserved


JSON as 3NF –
Logical Model
"colors": [
{
"color": "black",
"category": "hue",
"type": "primary",
"code":
{
"rgba": [255,255,255,1],
"hex": "#000”
}
}
]

© 2018 Snowflake Computing Inc. All Rights Reserved


JSON as 3NF –
Schema Model
"colors": [
{
"color": "black",
"category": "hue",
"type": "primary",
"code":
{
"rgba": [255,255,255,1],
"hex": "#000”
}
}
]

© 2018 Snowflake Computing Inc. All Rights Reserved


JSON as 3NF –
Schema Model
"colors": [
{
"color": "black",
"category": "hue",
"type": "primary",
"code":
{
"rgba": [255,255,255,1],
"hex": "#000”
}
}
]

© 2018 Snowflake Computing Inc. All Rights Reserved


JSON as 3NF –
Schema Model
"colors": [
{
"color": "black",
"category": "hue",
"type": "primary",
"code":
{
"rgba": [255,255,255,1],
"hex": "#000”
}
}
]

© 2018 Snowflake Computing Inc. All Rights Reserved


JSON as Denormalized –
Relational Model
"colors": [
{
"color": "black",
"category": "hue",
"type": "primary",
"code":
{
"rgba": [255,255,255,1],
"hex": "#000”
}
}
]

© 2018 Snowflake Computing Inc. All Rights Reserved


Data Vault Style
JSON as
Data Vault
"colors": [
{
"color": "black",
"category": "hue",
"type": "primary",
"code":
{
"rgba": [255,255,255,1],
"hex": "#000”
}
}
]

© 2018 Snowflake Computing Inc. All Rights Reserved


JSON as
Data Vault
"colors": [
{
"color": "black",
"category": "hue",
"type": "primary",
"code":
{
"rgba": [255,255,255,1],
"hex": "#000”
}
}
]

© 2018 Snowflake Computing Inc. All Rights Reserved


JSON as
Data Vault
"colors": [
{
"color": "black",
"category": "hue",
"type": "primary",
"code":
{
"rgba": [255,255,255,1],
"hex": "#000”
}
}
]

© 2018 Snowflake Computing Inc. All Rights Reserved


JSON as
Data Vault
"colors": [
{
"color": "black",
"category": "hue",
"type": "primary",
"code":
{
"rgba": [255,255,255,1],
"hex": "#000”
}
}
]

© 2018 Snowflake Computing Inc. All Rights Reserved


JSON as
Data Vault
"colors": [
{
"color": "black",
"category": "hue",
"type": "primary",
"code":
{
"rgba": [255,255,255,1],
"hex": "#000”
}
}
]

© 2018 Snowflake Computing Inc. All Rights Reserved


JSON as
Data Vault
"colors": [
{
"color": "black",
"category": "hue",
"type": "primary",
"code":
{
"rgba": [255,255,255,1],
"hex": "#000”
}
}
]

© 2018 Snowflake Computing Inc. All Rights Reserved


What if the JSON changes?
• That is the point of schema-on-read
• No changes to ingest the data
• NoSQL, Snowflake, Oracle
• Example
• More attributes on Color Category or Color Type
• Like “Description”
• In a 3NF model
• Add new columns to entities/tables
• ALTER TABLE required
• In a Data Vault model
• Add new Sat tables on existing Hubs
• CREATE TABLE required
• No change required to existing tables

© 2018 Snowflake Computing Inc. All Rights Reserved


JSON Example #2
{
"citiesLived": [
"fullName": "Johnny Appleseed",
"age": 42, {
"gender": "Male", "cityName": "London”,
"phoneNumber": "yearsLived": [ "1989", "1993", "1998", "2002" ]
{
},
"areaCode": "415", Nested Elements
"subscriberNumber": "5551234” {
}, "cityName": "San Francisco",
"children": Nested Array of Values,
"yearsLived": [ "1990", "1993", "1998", "2008" ]
[ Within a Nested Array
},
{ Of Elements
"name": "Jayden", {
"gender": "Male", "cityName": "Portland",
"age": "10" },
"yearsLived": [ "1993", "1998", "2003", "2005" ]
{
},
"name": "Emma", Nested Array of Elements
"gender": "Female", ]
"age": "8" }, }
{
"name": "Madelyn",
"gender": "Female",
"age": "6" }
],
© 2018 Snowflake Computing Inc. All Rights Reserved
JSON as 3NF –
Logical Model

"fullName": "Johnny Appleseed",


"age": 42,
"gender": "Male",

© 2018 Snowflake Computing Inc. All Rights Reserved


JSON as 3NF –
Logical Model

"phoneNumber":
{
"areaCode": "415",
"subscriberNumber": "5551234”
},

© 2018 Snowflake Computing Inc. All Rights Reserved


JSON as 3NF –
Logical Model
"children":
[
{
"name": "Jayden",
"gender": "Male",
"age": "10" },
{
"name": "Emma",
"gender": "Female",
"age": "8" },
{
"name": "Madelyn",
"gender": "Female",
"age": "6" }
],

© 2018 Snowflake Computing Inc. All Rights Reserved


JSON as 3NF –
Logical Model

"cityName": "London”,

© 2018 Snowflake Computing Inc. All Rights Reserved


JSON as 3NF –
Logical Model {
"cityName": "Portland",
"yearsLived": [ "1993", "1998", "2003", "2005" ]
},

© 2018 Snowflake Computing Inc. All Rights Reserved


JSON as 3NF -
Schema Model
• Can handle some
JSON schema
changes
• Kids get a phone!
• Kids move out!
• Extensions
• More details on City
• Add columns
• More details on
Children
• Add columns or a
dependent table

© 2018 Snowflake Computing Inc. All Rights Reserved


Data Vault Style
JSON as
Data Vault

© 2018 Snowflake Computing Inc. All Rights Reserved


JSON as
Data Vault

© 2018 Snowflake Computing Inc. All Rights Reserved


JSON as
Data Vault

© 2018 Snowflake Computing Inc. All Rights Reserved


JSON as
Data Vault

© 2018 Snowflake Computing Inc. All Rights Reserved


Special case –
JSON as Transaction Link
Data Vault
• Can handle some Year included
JSON schema in UKs
changes
• Two parents, same kids
• Kids get a phone!
• Kids move out!
• Easy Extensions
• More details on City
• Add a Sat
• Add Link(s)
• More details on Children
• Add a Sat on Link

© 2018 Snowflake Computing Inc. All Rights Reserved


Conclusion

• We still need data models and data modelers


• Schema-on-Read does not mean there is no model
• To READ the data we must understand the SCHEMA
• In the DB world that means we need a model
• Some model types can be easily extended for JSON changes
• Once the schema is understood
• Can be expressed as any type of model
• 3NF
• Data Vault
• Star
• Denormalized
• Object model
• Etc.

© 2018 Snowflake Computing Inc. All Rights Reserved


SHAMELESS PLUG:

Available on
Amazon.com

https://www.amazon.com/
Better-Data-Modeling-
Enhancing-Developer-
ebook/dp/B00UK75LYI/
SHAMELESS PLUG:

Available on
Amazon.com

http://www.amazon.com
/Better-Data-Modeling-
Introduction-
Engineering-
ebook/dp/B018BREV1C/
Discover the performance, concurrency, and simplicity of
Snowflake

As easy as 1-2-3!
Sign up and receive
01 Visit Snowflake.com $400 worth of free
usage for 30 days!
02 Click “Try for Free”

03 Sign up & register

Snowflake is the only data warehouse built for the


cloud. You can automatically scale compute up,
out, or down—independent of storage. Plus, you
have the power of a complete SQL database, with
zero management, that can grow with you to
support all of your data and all of your users. With
Snowflake On Demand™, pay only for what you
use.

© 2018 Snowflake Computing Inc. All Rights Reserved


Contact Information
Kent Graziano
Snowflake Computing
[email protected]
On Twitter @KentGraziano

More info at
http://snowflake.com

Visit my blog at
http://kentgraziano.com
THANK YOU

© 2018 Snowflake Computing Inc. All Rights Reserved

You might also like