Database 2
Database 2
Vu Tuyet Trinh
[email protected] Department of Information Systems, Faculty of Information Technology Hanoi University of Technology
Data Models
A data model is a plan for building a database* A database
a set of concepts used to describe the database structure
data types Constraints
*http://www.computerworld.com/databasetopics/data/story/0,10801,80205,00.html
A Brief History
Hierarchical model
IMS, System 2k, ...
Relationa l model
XML
???
1990 1995 2000 2005 2010
1965
1970
1975
1980
1985
Entity-Relationship Model
Network model
DMS(65), CODASYL (71), IDMS, IDS
Object-Oriented model
Semi-structure Model
Link
name type: 1-1, 1-n, n-1, recursive
Operation
Navigating: FIND, FIND member, FIND owner, FIND NEXT Function: GET
Example
teacher
teach
class
include
study
subject
notes
student
have
note
Discussion
Discussion about the advantages and g inconvenient of this model
Hierarchical model
Represented as tree
Parent/child relationshop Each node has only one parent node 1 DB = set of trees
Concepts
Record Link Operation: GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN PARENT, ...
Example
class
include study
teacher
teach
studen
have
subject
subject
noted
subject
note
note
Discussion
Discussion about the advantages and g inconvenient of this model
Relational Model
Represented as table Based on set theory Concept
Attribute/Field/Column
Name Data type, Domain value
Relation: defined on a set of attributes Tuple Key: Operation: union, intersection, cartesian product, selection, projection, join, ...
Enrol
SID 3936 1108 8507 Course 101 113 101
Course C
No 113 101 Name BCS MCS Dept CSCE CSCE
Subject j
No 21 23 29 18 Name Systems Database VB Algebra Dept CSCE CSCE CSCE Maths
Discussion
Discussion about the advantages and g inconvenient of this model
Entity-Relation
ER diagram Concepts
Entity: represents an object in real-world Set of entities Attribute: property of an entity Key: Relationship: Association between two (or more) entities. entities
o o
o o
Set of relationship
Example
ID Name Credit
examine
Subject
program
student
include
Class
head Fac.
Discussion
Discussion about the advantages and g inconvenient of this model
Object-Oriented Model
Represented by class diagram p y g Concepts
Object: having an unique identity Attribute: a property of the object, Method : an operation on the object Class: represents for a set objects having the same set of attributes and methods.
V d
class student { string ID; string Name; date Birthday; boolean Sex; string Address; string Class; string getName(); string getBirthday(); string getAddress(); string getClass(); void setAddress(string DC_moi); void setClass(string class);
Discussion
Discussion about the advantages and g inconvenient of this model
Database Design
Extended Entity Relationship y p
Top Down Conceptual/Abstract View
Functional Dependencies
Bottom Up Implementation View Synthesise relations
Requirements Collection
2: Design 3: Implementation
Using a DBMS
ER Model
well-suited to data modelling for use with databases
easy to represent and explain readily translated to relations.
Basic concepts
Attribute
represent a property/characteristic of an object in real world
Entity
defined as a set of attributes
Entity Set
Set of all entity instances of the same entity type
Relationship Set
Set of all relationship instances of the same relationship type
Name
Salary
Address
Age
Types of Attribute
Atomic
Ex: Database; 3083; Bundoora
Composite
Ex: Address ~ (Number, Street, and Suburb) 5+ Plenty Road+ Bundoora
EMPLOYEE
Salary
Suburb
Types of Attribute
Multi-valued
Ex: Degrees of a person BCS, PhD
TEACHER
Derived
Ex: Derived a persons age from attribute Date of birth Age
Relationships
Degree of relationship g p
Unary Binary Ternary
M
SUBJECT
prerequisite i it
STUDENT
Cardinality
1-1 1-n n-m
RENTAL SHOP
Enrols in
SUBJECT
owns
VIDEO
1
BORROWER
Enrols in Title
N M
ID Address
STUDENT
SUBJECT
Code
Week entity
Definition
Existence depends on the existence of one or more other entities. Primary key partially or totally derived from the owner entity.
Example
EMPLOYEE is the owner entity, and DEPENDENT is the weak entity.
Occupation Employee No Dependent Name
EMPLOYEE
has
DEPENDENT
Name
EmployeeNo
Address
Age
STEP2: For each weak entity in the ER model, create a relation which includes all the simple attributes. The primary key of the relation is the combination of the primary key/s of the owner and the the key of the weak entity itself.
Employee No Occupation Dependent Name
EMPLOYEE
has
DEPENDENT
STEP3: For each binary 1 TO 1 Relationship identify the two relations that correspond to the entities participating in the relationship. Choose one of the Relation and include as foreign key the primary key of the the other relation.
Dept# Deptname Manstaff# Name Address DEPARTMENT
MANAGED BY
MANAGER
STEP4: For each binary 1 TO N Relationships identify the relations that represents the participating entity at the N (i.e many) side of the relationship. Include as foreign key in the relation that holds the N side, the primary key of the other entity (that hold the 1 side)
E# Ename Dept# Deptname
EMPLOYEE
WORKS FOR
DEPARTMENT
STEP5: For each binary M:N Relationships create a new relation to represent the relationship. The primary key of the new relation is the combination of the primary keys of the two connected entities.
E# Ename P# Ptitle
EMPLOYEE
WORKS ON
PROJECT
STEP6: For each multivalued attribute, create a new relation that includes the multivalued attribute and the primary key of the entity where the multivalued attribute is attached.
E# Ename
Degree
EMPLOYEE
STEP7: For each n-ary ( > 2 ) relationships create a new relation to represent the relationship. The primary key of the new relation is the combination of the primary keys of the participating entities that hold the N side.
Sup# Supname SQuantity SDate P# Ptitle
SUPPLIER M
SUPPLY
PROJECT N
N PART
Exercises
Database Design
Extended Entity Relationship
Top Down Conceptual/Abstract View
Functional Dependencies
Bottom Up Synthesise relations List all attributes Consider the relationships between them Decompose attributes into tables in order to eliminate the redundancy.
1. 2. 3.
ENROL(studno,courseno,labmark,exammark)
COURSE(courseno,subject,equip)
lecturer roomno appraiser lecturer roomno appraiser year yeartutor year yeartutor
SCHOOL(hons,faculty)
name jones jones jones brown brown smith blogg jones peters null null patel
tutor bush bush wibby kahn kahn goble goble zobel kahn capon null null
roomn o 2.26 2.26 2.26 IT206 IT206 2.82 2.82 2.34 A17 A14 null null
course no cs250 cs260 cs270 cs250 cs270 cs270 cs280 cs250 cs250 null cs290 null
subject prog graphics elecs prog elecs comms design prog prog null specs null
F studno name, tutor tutor roomno roomno tutor courseno subject studno, courseno labmark
Process of Normalization
Represent all user views (e.g forms, reports etc) as a collection of relations Normalize these relations Combine relations that have exactly the same primary key/s.
ENROL (studno, courseno, subject, labmark) courseno subject studno, courseno labmark
Example
ENROL (studno, courseno, subject, labmark) courseno subject studno, courseno labmark ENROL (studno, courseno, labmark) studno, courseno labmark COURSE (courseno, subject) courseno subject
Example
STUDENT (studno, name, tutor) studno name, tutor TUTOR (tutor, roomno) tutor roomno roomno tutor ENROL (studno, courseno, labmark) studno, courseno labmark COURSE (courseno, subject) courseno subject
Remarks
only in rare situations that a relation in 3NF is not in 4NF or 5NF. most relations that are in 3NF are also in BCNF.
Dependency Preservation
The union of dependencies that hold on the individual relations in decomposition D must be equivalent to F. Given F on R, F(Ri) where Ri R is the set of dependencies X Y in F+ such that the attributes in X Y are all contained in Ri Decomposition D = {R1, R2, ..., Rm} of R is dependency preserving w.r.t. F if (F(R1)) .... F(Rm)))+ = F+ Given the restriction of functional dependencies to a relation is the fds that involve attributes of that relation Fi for Ri n n U Fi F possible, but... (U Fi)+ = F+ i=1 i =1
Dependency Preservation
STUDENT studno name s1 jones s2 brown s3 smith s4 bloggs s5 jones s6 peters tutor bush kahn goble goble zobel kahn roomno 2.26 IT206 2.82 2.82 2.34 IT206 appraiser capon watson capon capon watson watson
studno name studno tutor tutor roomno tutor appraiser roomno tutor roomno appraiser studno appraiser studno roomno studno appraiser studno roomno
TUTOR studno s1 s2 s3 s4 s5 s6
STUDENT
* TUTOR = STUDENT
Exercises
Database Architecture
Vu Tuyet Trinh
[email protected] Department of Information Systems, Faculty of Information Technology Hanoi University of Technology
Architecture
Application Query Processing
DBMS
Transaction Management
Storage Management
Data
Data
QP Trans. Mgt
Storage Management
Responsible for storing and accessing data data. Buffer manager
responsible for partitioning the available main memory into buffers
Storage Management Buffer Mgt File Mgt
Storage Mgtt
Trans. Mgt
File Management
responsible for interacting with file system
Stored Data
Data: the contents of database itself
Metadata:
the database schema that describes the structure of and constraints on the database
Statistics
information gathered and stored by the DBMS about data properties such as the size of, and values in, various relations or other components of the database
Indexes
data structures that support efficient access to the data.
QP Trans. Mgt
Query Processing
Composed of
Parser Optimizer Execution Engine
Query Processing
Stoarge Mgt
Parser
responsible for verifying query syntax and semantic and translating query into query plan Query plan is a sequence of operations implementing relational algebra operation, to be performed on the data.
Storage Mgt
QP Trans. Mgt
Stoarge Mgt
Execution engine
Responsible for executing each of steps in the chosen query plan. interacts with most of the other components of the DBMS
Storage Mgt
Transaction Management
Accept transaction commands from an application which tell the transaction manager when transactions begin and end, as well as information about the expectations of the application (some may not wish to require atomicity, for example). The transaction processor performs the following tasks:
Logging Recovery Concurrency control
Transaction Management
Logging
every change in the database is logged separately on disk disk. The log manager follows one of several policies designed to assure that no matter when a system failure or crash occurs.
Recovery manager
able to examine the log changes and restore the database to some consistent state.
Concurency control
transactions must appear to execute in isolation. pp the schedeler (concurrency control manager) must assure that the individual actions of multiple transactions are executed in such an order that the net effect in the same as if the transactions had in fact executed in their entirely, once-at-atime. A typical scheduler does its work by maintaining locks on certain pieces of the database.
END USERS
EXTERNAL LEVEL
.. ..
EXTERNAL VIEWn
External/Conceptual Mapping
CONCEPTUAL LEVEL
Conceptual Internal Mapping
CONCEPTUAL SCHEMA
INTERNAL LEVEL
INTERNAL SCHEMA
STORED DATABASE
Internal schema
describes the physical storage structure of the database.
Conceptual schema
describes the structure of the whole database for a community of users.
DBMS Utilities
Loading
To load existing data files into the database
File reorganization
To reorganize a database file in order to improve performance
Report generation
To T generate reports based on the information from the t t b d th i f ti f th database
Performance monitoring
To provide the DBA with statistical data about the DB usage.
Concurrency
Vu Tuyet Trinh
[email protected] Department of Information Systems, Faculty of Information Technology Hanoi University of Technology
Example
500USD Account A read(A) If A > 500 then B: B 500 B:=B+500 A:=A-500 Crash Account B
Transaction
A sequence of read and write operations on data items that logically functions as one unit of work
Cho php m bo tnh nht qun v tnh ng n ca d liu
ACID Properties
Atomicity Consistency y Isolation Durability
Concurrency Control Recovery
Automicity
guarantee that either all of the tasks of a transaction are performed or none of them are Example T: Read(A,t1); If t1 > 500 { Read(B,t2); t2:=t2+500; Write(B,t2); t1:=t1-500; Write(A,t1); }
crash
Consistency
ensures that the DB remains in a consistent state before the start of the transaction and after the transaction is over Example A+B = C
T: Read(A,t1); If t1 > 500 { Read(B,t2); t2:=t2+500; Write(B,t2); t1:=t1-500; Write(A,t1); }
A+B = C
5
Isolation
ability of the application to make operations in a transaction appear isolated f t ti i l t d from all other operations. ll th ti Example A= 5000, B= 3000 T: Read(A,t1); If t1 > 500 { Read(B,t2); t2:=t2+500; Write(B,t2); t1:=t1-500; Write(A,t1); }
Durability
guarantee that once the user has been notified of success, th t the transaction will persist, and not be undone ti ill i t d tb d V d: A= 5000, B= 3000 T: Read(A,t1); If t1 > 500 { Read(B,t2); t2:=t2+500; Write(B,t2); t1:=t1-500; Write(A,t1); }
Transaction States
Concurrency Control
Objective:
ensures that database transactions are performed concurrently without the concurrency violating the data integrity guarantees that no effect of committed transactions is lost, and no effect of aborted (rolled back) transactions remains in the related database.
Example
T0: read(A); A := A -50; write(A); read(B); B := B + 50; write(B); T1: read(A); temp := A *0.1; A := A -temp; write(A); read(B); B := B + temp; write(B);
10
Scheduling
(1)
(2)
(3)
11
Serializability
A schedule of a set of transactions is a linear ordering of their actions
e.g. for the simultaneous deposits example:
R1(X) R2(X) W1(X) W2(X) A serial schedule is one in which all the steps of each transaction occur consecutively A serializable schedule is one which is equivalent to some serial schedule
Lock
Definition
a synchronization mechanism for enforcing limits on access to DB in concurrent way. one way of enforcing concurrency control policies
Lock types
Shared lock (LS) readable but can not write Exclusive lock (LX): c v ghi DL UN(D): unlock
LS LS LX true
LX false
13
Compatibility
false false
Example
T0: LX(A); read(A); A := A -50; write(A); LX(B); read(B); B := B + 50; ( ); write(B); UN(A); UN(B); T1: LX(A); read(A); temp := A *0.1; A := A -temp; write(A) LX(B); read(B); B:=B+temp; p; write(B); UN(A); UN(B);
14
Phase 2
locks are released and no locks are acquired
BOT
EOT
Example
T1 Lock(A) Read(A) Lock(B) Read(B) B:=B+A Write(B) Unlock(A) Unlock(B) U l k(B) T2 Lock(B) Read(B) Lock(A) Read(A) Unlock(B) A:=A+B Write(A) W it (A) Unlock(A) T3 Lock(B) Read(B) B=B-50 Write(B) Unlock(B) Lock(A) Read(A) A A+50 A=A+50 Write(A) Unlock(A) T4 Lock(A) Read(A) Unlock(A) Lock(B) Read(B) Unlock(B) Pritn(A+B)
Deadlock
T0: LX(B); read(B); B := B +50; write(B); LX(A); read(A); A := A - 50; ( ); write(A); UN(A); UN(B); T1: LX(A); read(A); temp := A *0.1; A := A -temp; write(A) LX(B); read(B); B:=B+temp; p; write(B); UN(A); UN(B);
18
Resolving Deadlock
Detecting g
Recovery when deadlock happen
rollback
Used waiting-graph
Avoiding
Resource ordering g Timeout Wait-die Wound-wait
Waiting Graph
Graph
Node handling lock or waiting for lock Edge T U
U handle L(A) T wait to lock A T must wait until U unlock A
deadlok
Timeout
Set a limit time for each transaction If time-out do rollback
Exercises
Crash Recovery
Vu Tuyet Trinh
[email protected] Department of Information Systems, Faculty of Information Technology Hanoi University of Technology
Transaction
collection of action that preserve consistency p y
Consistent DB
Consistent DB
with assumption
IF THEN T starts with consistent state + T executes in isolation T leaves consistent state
Data sharing
e.g., T1 and T2 in parallel
Failures
Events Desired Undesired Expected Unexpected processor
CPU
memory
M
disk
D
Recovery
Maintaining the consistency of DB by ROLLBACK to the last consistency state. Ensuring 2 properties
Atomic Durability
Using LOG
Transaction Log
A sequence of log record keeping trace of actions executed by DBMS <start T>
Log the beginning of the transaction execution
<commit T>
transaction is already finished
<abort T>
Transaction is calcel
<T, X, v, w>
Transaction makes an update actio, before update X=v, after update x = w
Transaction Log
Handled in main memory and put to external y p memory (disk) when possible
A = 8 16 B = 8 16 Actions Log
Memory
Checkpoint
Definition:
moment where intermediate results and a log record are saved to disk. being initiated at specified intervals
Objective
minimize the amount of time and effort wasted when restart the process can be restarted from the latest checkpoint rather than from the beginning. beginning
Undo-logging
Step 1 2 3 4 5 6 7 8 9 10 11 12 Action Read(A,t) t:=t*2 Write(A,t) Read(B,t) t:=t*2 Write(B,t) Flush log Output(A) Output(B) Flush log t 8 16 16 8 16 16 16 16 Mem A 8 8 16 16 16 16 16 16 Mem B Disk A 8 8 8 8 8 8 16 16 Disk B 8 8 8 8 8 8 8 16 <commit T> <T, B, 8> Mem Log <start T>
<T, A, 8>
8 8 16 16 16
Undo-Logging Rules
(1) For every action generate undo log record (containing old value) (2) Before X is modified on disk, log records pertaining to X must be on disk (write ahead logging: WAL) (3) Before commit is flushed to log, all writes of transaction must be reflected on disk
For each Ti S
Write <abort Ti> to log
Redo-logging
Step Action 1 2 3 4 5 6 7 8 9 10 11 Read(A,t) t:=t*2 Write(A,t) Read(B,t) t:=t*2 Write(B,t) Write(B t) Flush log Output(A) Output(B) t 8 16 16 8 16 16 Mem A Mem B 8 8 16 16 16 16 Disk A 8 8 8 8 8 8 Disk B 8 8 8 8 8 8 Mem Log <start T>
<T, A, 16>
8 8 16
16 16
16 16
16 16
16 16
8 16 <T, end>
Redo-logging Rules
(1) For every action, generate redo log record (containing new value) (2) Before X is modified on disk (DB),all log records for transaction that modified X (including commit) must be on disk (3) Flush log at commit (4) Write END record after DB updates flushed to disk
Discussion
Undo Logging
need to write to disk as soon transaction finishes Access disk
Redo Logging
need to keep all modified blocks in memory until commit Use memory
Undo/Redo Loggin
Step 1 2 3 4 5 6 7 8 9 10 11 Action Read(A,t) t:=t*2 Write(A,t) Read(B,t) t:=t*2 Write(B,t) Write(B t) Flush log Output(A) Output(B) t 8 16 16 8 16 16 16 16 Mem A 8 8 16 16 16 16 16 16 Mem B Disk A 8 8 8 8 8 8 16 16 Disk B 8 8 8 8 8 8 8 <commit T> 16 16 Mem Log <start T>
<T, A, 8, 16>
8 8 16 16
Exercises
Index Management
Vu Tuyet Trinh
[email protected] Department of Information Systems, Faculty of Information Technology Hanoi University of Technology
File Organization
Data storage in file
records, records blocks and access structures
Hashed Files: if selection on equality Collection of buckets with primary & overflow pages Hashing function over search key attributes
Heap File
Organization
Data unordered Write new data at end Data Page Header Page Data Page Data Page Data Page Pages with Free Space Data Page Data Page Full Pages
Need a full scan file for Search, Insert, Update, Delete operations
Indexing technique
Search key y
Any subset of the fields of a relation can be the search key Search key may not be the key in relation
Index
a collection of k data entries supports efficient retrieval with a given key value k.
Classes of Indexes
Primary vs. secondary: primary has primary key Clustered vs. unclustered: order of records and index approximately same
Alternative 1 implies clustered, but not vice-versa A file can be clustered on at most one search key
Dense vs. Sparse: dense has index entry per data value; sparse may skip some
Alternative 1 always leads to dense index Every sparse index is clustered! Sparse indexes are smaller; however, some useful optimizations are based on dense indexes
CLUSTERED
UNCLUSTERED
Data entries
Data Records
Data Records
B Tree Technique
Organization
Root is either a leaf node or a node having at least 2 childrent nodes Except root and leaf node, every node have [m/2] n m con. Length of every path from the root to a leaf is equal
Example
Search begins at root, and key comparisons direct it to a leaf.
Root
13 17 24 30
2*
3*
5*
7*
14* 16*
Operations
Inserting g Deleting Updating Searching
Discussion about -How to do -Complexity
B+ Tree Summary
B+ tree and other indices ideal for range searches, good for equality searches.
Inserts/deletes leave tree height-balanced; logF N cost. High fanout (F) means depth rarely more than 3 or 4. Almost always better than maintaining a sorted file. Typically, 67% occupancy on average. Note: Order (d) concept replaced by physical space criterion in practice ( at least half full) (at half-full ).
Records may be variable sized Index pages typically hold more entries than leaves
Hashing Technique
A familiar idea:
Requires good hash function (may depend on data) Distribute data across buckets Often multiple items in same bucket (buckets might overflow)
Example
h(x) = x mod 4 1 2 4 3 Store hash
1 1
2 2
3 3
4 4
1 1
2 2 10 6
3 3
4 4 12
Operations
Inserting g Deleting Updating Searching
Discussion about -How to do -Complexity
Query Processing
Vu Tuyet Trinh
[email protected] Department of Information Systems, Faculty of Information Technology Hanoi University of Technology
Query Processing
Query plans & exec strategies Standard relational operators Query Optimization
Query Plans
Data-flow graph of g p relational algebra operators Typically: determined by optimizer
SELECT * FROM PressRel p, Clients C WHERE p.Symbol = c.Symbol AND c.Client = Atkins AND c.Symbol IN (SELECT CoSymbol FROM EastCoast)
Join J i
PressRel.Symbol = EastCoast.CoSymbol
Join
PressRel.Symbol = Clients.Symbol
Project
CoSymbol
Select
Client = Atkins
Scan
PressRel
Scan
Clients
Scan
EastCoast
Join
PressRel.Symbol = Clients.Symbol
Project
CoSymbol
Select
Client = Atkins
Scan
PressRel
Scan
Clients
Scan
EastCoast
Join J i
PressRel.Symbol = EastCoast.CoSymbol
Join
PressRel.Symbol = Clients.Symbol
Project
CoSymbol
Select
Client = Atkins
Scan
PressRel
Scan
Clients
Scan
EastCoast
Basic Principles
Many DB operations require reading tuples, tuple vs. previous tuples, or tuples vs. tuples in another table Techniques generally used:
Iteration: for/while loop comparing with all tuples on disk Index: if comparison of attribute thats indexed, look up matches in index & return those Sort: iteration against presorted data (interesting orders) Hash: build hash table of the tuple list, probe the hash table
Basic Operators
One-pass operators:
Scan Select Project
Multi-pass operators:
Join
Various implementations Handling of larger-than-memory sources
Relation S
Variations
Tuple R p
Tuple S p
SOURCE R
SOURCE S
Two-Pass Algorithms
Sort-based
Need to do a multiway sort first (or have an index) Approximately linear in practice, 2 b(T) for table T
Hash-based
Store one relation in a hash table
(Sort-)Merge Join
Requires data sorted by join attributes
Merge and join sorted files reading sequentially a files, block at a time
Preserves sorted order of outer relation Very efficient for presorted data Can be hybridized with NL Join for range j y g joins May require a sort before (adds cost + delay) Cost: b(R) + b(S) plus sort costs, if necessary In practice, approximately linear, 3 (b(R) + b(S))
Hash-Based Joins
Allows partial p p p pipelining of operations with g p equality comparisons Sort-based operations block, but allow range and inequality comparisons Hash joins usually done with static number of hash buckets
Generally have fairly long chains at each bucket What happens when memory is too small?
build b ld
probe
Cost Model
Size-Distribution Estimator
Rewriting
Principle [Lev01]
reformulating a client query to use available views replacing views by their definition View def.:
client query
V V T U
Algebraic Space
Search strategy
Planner
Cost Model
Method-Structure Space
Size-Distribution Estimator
Search space
Cost estimation
Search Space
Algebraic space
operators execution orders that are to be considered by the Planner for each query sent to it
Method-structure space
implementation choices that exist for the execution of each specified ordered series of operators
c1 ... cn ( R) c1 ( . . . cn ( R) )
c1 ( c 2 ( R) ) c 2 ( c1 ( R) )
Projections: a1 ( R) a1 . . . ( an ( R) ) Joins: R (S (R
Show that:
T) (R R) T) (T (S
S) R)
T S
S) (S R
More Equivalences
A projection commutes with a selection that only uses attributes retained by the projection projection. Selection between attributes of the two arguments of a crossproduct converts cross-product to a join. A selection on just attributes of R commutes with R S. (i.e., (R S) (R) S ) a Similarly, if a projection follows join R S, we can `push it by retaining only attributes of R (and S) that are needed for the join or are kept by the projection.
For queries over a single relation, queries consist of a combination of selects, projects, and aggregate ops:
Each available access path (file scan / index) is considered, and the one with the least estimated cost is chosen. The different operations are essentially carried out together (e g if an index is (e.g., used for a selection, projection is done for each retrieved tuple, and the resulting tuples are pipelined into the aggregate computation).
Cost estimation
Cost factors
CPU cost + I/O cost + Communication cost Size of (intermediate) relations
Search strategy
Objective
exploring the set of alternative execution plans and finding the cheapest one
Taxonomy
Polynomial vs. Combinatorial Heuristics vs. Systematic Deterministic vs. Randomized Transformative vs Constructive vs.
D C
A B C D
D C A B
step, using either an `interestingly ordered plan or an interestingly ordered addional sorting operator. An N-1 way plan is not combined with an additional relation unless there is a join condition between them, unless all predicates in WHERE have been used up.
i.e., avoid Cartesian products if possible.
In spite of pruning plan space this approach is still space, exponential in the # of tables.
Trends Perspectives
Extending data model g
More extensible More flexible
Distribution
More distributive
Heterogeneity g y
Example of XML
Processing Instr. <?xml version="1.0" encoding="ISO-8859-1" ?> <dblp> Open-tag <mastersthesis mdate 2002 01 03 key="ms/Brown92"> mdate="2002-01-03" key ms/Brown92 > <author>Kurt P. Brown</author> <title>PRPL: A Database Workload Specification Language</title> <year>1992</year> Element <school>Univ. of Wisconsin-Madison</school> </mastersthesis> <article mdate="2002-01-03" key="tr/dec/SRC1997-018"> <editor>Paul R. McJones</editor> Attribute <title>The 1995 SQL Reunion</title> <journal>Digital System Research Center Report</journal> <volume>SRC1997-018</volume> <year>1997</year> Close-tag <ee>db/labs/dec/SRC1997-018.html</ee> <ee>http://www.mcjones.org/System_R/SQL_Reunion_95/</ee> </article> </dblp>
root p-i i
attribute element l t
text
mastersthesis mdate
2002
2002
ms/Brown92
XML Schema
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" > <xsd:element name=mastersthesis" type=ThesisType"/> XML syntax <xsd:complexType name=ThesisType"> < sd comple T pe name ThesisT pe"> Better way of defining <xsd:attribute name=mdate" type="xsd:date"/> keys using XPaths <xsd:attribute name=key" type="xsd:string"/> Type sub-classing <xsd:attribute name=advisor" type="xsd:string"/> more complex than in <xsd:sequence> a programming <xsd:element name=author" language type=xsd:string"/> <xsd:element name=title" type=xsd:string"/> Domains and built-in <xsd:element name= year name=year" data-types d t t type=xsd:integer"/> <xsd:element name=school" type=xsd:string/> <xsd:element name=committeemember" type=CommitteeType minOccurs=0"/> </xsd:sequence> </xsd:complexType>
Schema Example
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name=mastersthesis" type=ThesisType"/> <xsd:complexType name=ThesisType"> name= ThesisType > <xsd:attribute name=mdate" type="xsd:date"/> <xsd:attribute name=key" type="xsd:string"/> <xsd:attribute name=advisor" type="xsd:string"/> <xsd:sequence> <xsd:element name=author" type=xsd:string"/> <xsd:element name=title" type=xsd:string"/> <xsd:element name=year" type=xsd:integer"/> <xsd:element name=school" type=xsd:string/> yp g <xsd:element name=committeemember" type=CommitteeType minOccurs=0"/> </xsd:sequence> </xsd:complexType>
Discussion
Comparison of DTD and XML Schema p
Internet System
Not just a web server or web application An application built over the Internet
Having many participants in client-server or server-to-server fashion Exchanging data / code in distributed fashion for operating the application Partitioning, replicating, translating, data Having code written in different environments, languages, etc. Handling failures, firewalls,
Data integration
Aims
An integrated view on multi data sources A transparent access to multi data sources
Architecture:
Three tier (mediation) .... Multi-tier (hierarchical mediation
Problems
How to define the integrated view How to locate data sources How to forward user queries
Mediator-based architecture
MEDATION N
Mediation Schema
R(#K, A, B, C)
Q: R1 R2 R3
SOURCE ES
Source 1
Schma global
LAV
Source 1
Source 1contains A
Peer-to-Peer Systems
A distributed system with a large number of peers. Every participing peer
as a client and a server providing access to (some of) its ressources
Network
Search: Resource discovery, Index Storage: Replication, Managing updates, Robustness & fault tolerance Security: Authenticity, Privacy & confidentiality
18
Classification
Unstructured systems : no restriction on data placement
Existing catalog/index Search : (mostly) flooding techniques, keyword search
19
Classification (2)
Centralized model
Global index (catalog) A central authority node (as a server)
Decentralized model :
No global index No central coordination
21
GRID
Database b Machine
Submission Machine UI
22
Application
Mains problems
Resource sharing
Computers, storage, sensors, networks, Heterogeneity of device, mechanism, policy Sharing conditional: negotiation, payment,
Ideas
Need for interoperability when different groups want to share resources
Diverse components, policies, mechanisms E.g., standard notions of identity, means of communication, resource descriptions
Basic services
Authentication Authorization Activity control Resource information Resource brokering Scheduling Job submission, data access/migration and execution Accounting
Compute Resources
Storage Resources
MU: Mobile Unit MSS: Mobile Support Station Fixed Host: No Wireless interface
28
29
Sensor Network
Kirk Martinez, Jane K. Hart, and Royan Ong, Environmental Sensor Networks, Computer, Volume 37, Issue 8, Aug. 2004 Page(s): 50 - 56
30