Week 1: Introduction: Database System Concepts
Week 1: Introduction: Database System Concepts
Week 1: Introduction
2
University Database Example
▪ Application program examples
▪ Add new students, instructors, and courses
▪ Register students for courses, and generate class rosters
▪ Assign grades to students, compute grade point
averages (GPA) and generate transcripts
▪ In the early days, database applications were built directly on
top of file systems
3
Drawbacks of using file systems to store data
4
Drawbacks of using file systems to store data (Cont.)
▪ Atomicity of updates
▪ Failures may leave database in an inconsistent state with partial
updates carried out
▪ Example: Transfer of funds from one account to another should
either complete or not happen at all
▪ Concurrent access by multiple users
▪ Concurrent access needed for performance
▪ Uncontrolled concurrent accesses can lead to inconsistencies
▪ Example: Two people reading a balance (say 100) and
updating it by withdrawing money (say 50 each) at the same
time
▪ Security problems
▪ Hard to provide user access to some, but not all, data
5
Relational Model
▪ All the data is stored in various tables.
▪ Example of tabular data in the relational model Columns
Rows
6
A Sample Relational Database
7
Database Design
The process of designing the general structure of the database:
8
Database Design (Cont.)
▪ Is there any problem with this relation?
9
Design Approaches
▪ Need to come up with a methodology to ensure that each of the
relations in the database is “good”
▪ Two ways of doing so:
▪ Entity Relationship Model
▪ Models an enterprise as a collection of entities and
relationships
▪ Represented diagrammatically by an entity-relationship
diagram:
▪ Normalization Theory
▪ Formalize what designs are bad, and test for them
10
Data Definition Language (DDL)
▪ Specification notation for defining the database schema
Example: create table instructor (
ID char(5),
name varchar(20),
dept_name varchar(20),
salary numeric(8,2))
▪ DDL compiler generates a set of table templates stored in a data dictionary
▪ Data dictionary contains metadata (i.e., data about data)
▪ Database schema
▪ Integrity constraints
▪ Primary key (ID uniquely identifies instructors)
▪ Authorization
▪ Who can access what
11
Data Manipulation Language (DML)
▪ Language for accessing and manipulating the data organized
by the appropriate data model
▪ DML also known as query language
▪ Two classes of languages
▪ Pure – used for proving properties about computational
power and for optimization
▪ Relational Algebra
▪ Tuple relational calculus
▪ Domain relational calculus
▪ Commercial – used in commercial systems
▪ SQL is the most widely used commercial language
12
SQL
13
Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation
14
Query Processing (Cont.)
▪ Alternative ways of evaluating a given query
▪ Equivalent expressions
▪ Different algorithms for each operation
▪ Cost difference between a good and a bad way of evaluating a
query can be enormous
▪ Need to estimate the cost of operations
▪ Depends critically on statistical information about relations
which the database must maintain
▪ Need to estimate statistics for intermediate results to compute
cost of complex expressions
15
XML: Extensible Markup Language
▪ Defined by the WWW Consortium (W3C)
▪ Originally intended as a document markup language not a
database language
▪ The ability to specify new tags, and to create nested tag structures
made XML a great way to exchange data, not just documents
▪ XML has become the basis for all new generation data interchange
formats.
▪ A wide variety of tools is available for parsing, browsing and
querying XML documents/data
16
History of Database Systems
▪ 1950s and early 1960s:
▪ Data processing using magnetic tapes for storage
▪ Tapes provided only sequential access
▪ Punched cards for input
▪ Late 1960s and 1970s:
▪ Hard disks allowed direct access to data
▪ Network and hierarchical data models in widespread use
▪ Ted Codd defines the relational data model
▪ Would win the ACM Turing Award for this work
▪ IBM Research begins System R prototype
▪ UC Berkeley begins Ingres prototype
▪ High-performance (for the era) transaction processing
17
History (cont.)
▪ 1980s:
▪ Research relational prototypes evolve into commercial systems
▪ SQL becomes industrial standard
▪ Parallel and distributed database systems
▪ Object-oriented database systems
▪ 1990s:
▪ Large decision support and data-mining applications
▪ Large multi-terabyte data warehouses
▪ Emergence of Web commerce
▪ Early 2000s:
▪ XML and XQuery standards
▪ Automated database administration
▪ Later 2000s:
▪ Giant data storage systems
▪ Google BigTable, Yahoo PNuts, Amazon, ..
18