2 Data Formats Relational DB
2 Data Formats Relational DB
500
Data Formats, Relational DB,
Advanced Queries
Topics today:
Commonly used file formats
Recap on Relational Databases / RDBMS
Relational DB + XML
Relational DB + JSON
Advanced SQL Queries
Commonly used File Formats
Text files
Comma Separated Values (CSV)
JSON
XML
Binary, open formats
Proprietary binary formats
10/12/2023
3
CSV Files
Primary Key
Foreign Key
Informal Recap on Normal forms (see lecture “Databases” for full definitions.)
1NF: Each attribute value must only contain atomic data. No complex content.
2NF: Non-Prime attributes must not depend on real subsets of a key.
Counter example: enrolment(matno, lvnr, semester, firstname, lastname)
3NF: Non-Prime attributes must only depend on the key / no transitive dependencies between
key and non-prime attributes.
Counter example: student(matno, firstname, lastname, studyplanID, studyPlanTitle)
BCNF: Every non-trivial dependency must have the form: A → B, where A is a key of the relation.
Counter example: enrolment(matno, assistant, labCourse)
{matno, labCourse } → assistant, assistant → labCourse →Keys: { matno, labCourse}, {matno,assistant}
Recap: Schema Quality / NF II
Durability
Recap on Concurrent Access
Different Isolation levels to balance consistency and performance (see lecture DB-T)
Serializable, Repeatable Read, Read Committed, Read Uncommitted
Recovery
Idea:
Each update is written in a LOG.
Forward and backward recovery.
Very powerful:
Log (hopefully) stored on different disk than data.
Given a backup and the most recent log, we can restore all committed transactions.
Log can be used to replicate DB to other nodes (Hot/Warm Standby).
SQL
Declarative query language, much more convenient than writing queries in some programming
language.
Allows for complex queries / however, intentionally not Turing Complete.
Relational
DBMS
Tons of vendors and open-source projects
(Oracle, Postgres, MS SQL, MySQL, SQLite,
…)
XML support was added with SQL’ 2003 and extended in 2008
JSON Support was added with SQL’2016.
Publishing SQL as XML
Publishing relational Data as XML: Usage of XML constructor functions in SQL select
statements:
XMLELEMENT()
XMLATTRIBUTES()
XMLAGG()
XMLROOT()
XMLCONCAT()
XMLPI(), XMLCOMMENT()
SELECT xmlelement(
name person, xmlattributes(email, geschlecht),vorname || ‘ ‘ || nachname
)
FROM person;
Publishing SQL as XML Example 2
Include friends
select xmlelement(
name person,
xmlattributes(p.email, geschlecht, vorname as fn, nachname as ln),
xmlelement(name friends,
xmlagg(
xmlelement(name friend, h.emailfreund)
)
)
)
from person p, hatfreund h
where p.email = h.email group by p.email
Publishing SQL as XML Example 4
<persons>
<person email="[email protected]" geschlecht="M" fn="Phillip"
ln="Winkler"><friends><friend>[email protected]</friend><friend>M.Kuehn
@sms.at</friend></friends>
</person>
select xmlelement(
<person email="[email protected]" geschlecht="W" fn="Lina"
name persons, ln="Hartmann"><friends><friend>[email protected]</friend><friend>Laura.Heinri
xmlagg(pq.pe) [email protected]</friend><friend>[email protected]</friend></friends><
</person>
)
...
from ( </persons>
select xmlelement(
name person,
xmlattributes(p.email, geschlecht, vorname as fn, nachname as ln),
xmlelement(name friends,
xmlagg(
xmlelement(name friend, h.emailfreund)
)
)
) as pe
from person p, hatfreund h
where p.email = h.email group by p.email
) as pq
Storing XML in Relations
Insert into weatherReportXML (id, report) values Inserting some rows containing XML
(1,'<station name="Klagenfurt Airport" temp= "21"/>'); strings
Insert into weatherReportXML (id, report) values
(2,'<station name="Villacher Alpe" temp= "6"/>');
[ Details ]
More SQL Related DB features
Insert into weatherReport (id, report) values Inserting some rows containing JSON
(1,’{“station” : “Klagenfurt Airport”, “temp” : “21” }’) strings
Insert into weatherReport (id, report) values
(2,’{“station” : “Villacher Alpe”, “temp” : “6” }’)
Right
Operand Example
Operator Type Description Example Result
-> int Get JSON array element (indexed from zero, negative '[{"a":"foo"},{"b":"bar"},{"c":"baz"}]'::json->2 {"c":"baz"}
integers count from the end)
-> text Get JSON object field by key '{"a": {"b":"foo"}}'::json->'a' {"b":"foo"}
#> text[] Get JSON object at specified path '{"a": {"b":{"c": "foo"}}}'::json#>'{a,b}' {"c": "foo"}
Select report -> ‘station’ as station from weatherReport [“Klagenfurt Airport”, “Villacher Alpe”]
Select report ->> ‘station’ as station from weatherReport Klagenfurt Airport, Villacher Alpe
Select report -> ‘station’ ->> name as name from We can chain -> if JSON is nested
weatherReport
Select report ->> 'station' as station from weatherReport We can use ->> and -> in where
where report ->> 'temp' = '6' clause as well
Select report ->> 'station' as station from weatherReport However, we might need to cast
where cast(report ->> 'temp' as integer) < 10 data accordingly.
Publishing Relational data as JSON I
Returns the value as json or jsonb. Arrays and composites are converted (recursively) to
arrays and objects; otherwise, if there is a cast from the type to json, the cast function will
be used to perform the conversion; otherwise, a scalar value is produced. For any scalar
type other than a number, a Boolean, or a null value, the text representation will be used, in
such a fashion that it is a valid json or jsonb value.
Example:
person(email, vorname, nachname,geburtsdatum, geschlecht)
select to_json(p) from person p
Publishing Relational Data as JSON III
[{
"email": "[email protected]",
"vorname": "Phillip",
json_agg aggregates sets of tuples to a JSON array [ Details ] "nachname": "Winkler",
"geburtsdatum": "1985-10-02",
"geschlecht": "M",
"friends": [
Output each person and a nested array of friends. "[email protected]",
"[email protected]"
]
SELECT json_agg(pt) FROM },{
"email": "[email protected]",
( "vorname": "Lina",
select p.*, json_agg(h.email) as friends "nachname": "Hartmann",
"geburtsdatum": "1988-01-25",
from person p, hatfreund h "geschlecht": "W",
where p.email = h.email group by p.email "friends": [
"[email protected]",
) as pt "[email protected]",
"[email protected]"
]
}
...
]
JSON / XML Conclusions
Here we discuss some types of queries you did not see in the lecture databases
Window Functions
Recursive Queries
Statistical
Window Functions
So far, we know
Simple functions that can only access the current row
Aggregate functions
However, aggregate functions always condense multiple rows to one (maybe in a group).
[ Tutorial, Details ]
Simple Window Functions I
select h.emailfreund
from person p, hatfreund h Person *
where p.email = h.email and
p.vorname = 'Hanna' and
p.nachname = 'Schmidt'
with boys as (
Select p.* from person p where geschlecht = 'M’
),
girls as (
Select p.* from person p where geschlecht = 'W’
)
select * from boys UNION select * from girls;
Evaluate the non-recursive term. For UNION (but not UNION ALL), discard duplicate rows.
Include all remaining rows in the result of the recursive query, and also place them in a
temporary working table.
So long as the working table is not empty, repeat these steps:
Evaluate the recursive term, substituting the current contents of the working table for the recursive
self-reference. For UNION (but not UNION ALL), discard duplicate rows and rows that duplicate
any previous result row. Include all remaining rows in the result of the recursive query, and also
place them in a temporary intermediate table.
Replace the contents of the working table with the contents of the intermediate table, then
empty the intermediate table.
Aggregate Functions for Statistics
Todays DBMS come with a whole range of aggregation functions useful for statistics.
Here we introduce the list for PostgreSQL
• Covariance: covar_pop(Y,X)
[ Details ]
Correlation: An example