1 Intro 2 Up
1 Intro 2 Up
• Database:
a very large, integrated collection of data.
• Models a real-world enterprise
– Entities (e.g., students, courses)
– Relationships
(e.g., Lance Armstrong is enrolled in 15-415)
– More recently, also includes active components
(e.g. “business logic”)
1
= Is the WWW a DBMS?
• Fairly sophisticated search available
– crawler indexes pages for fast search
• But, currently
– data is mostly unstructured and untyped
– can’t manipulate the data
– few guarantees provided for freshness of data,
consistency across data items, fault tolerance, …
– Web sites typically have a DBMS in the
background to provide these functions.
• The picture is quickly changing
– New standards like XML can help data modeling
– Research groups are working on providing some
of this functionality across multiple web sites.
2
“Search” vs. Query
• “Search” can
return only
what’s been
“stored”
3
“Yahoo Actors” JOIN “FECInfo”
(Courtesy of the Telegraph group)
Q: Did it Work?
• Thought Experiment 1:
– You and your project partner are editing the same file.
– You both save it at the same time.
– Whose changes survive?
4
Why Study Databases?? ?
• Shift from computation to information
– always true for corporate computing
– Web made this point for personal computing
– more and more true for scientific computing
• Need for DBMS has exploded in the last years
– Corporate: retail swipe/clickstreams, “customer
relationship mgmt”, “supply chain mgmt”, “data
warehouses”, etc.
– Scientific: digital libraries, Human Genome project,
NASA Mission to Planet Earth, sensors
• DBMS encompasses much of CS in a practical
discipline
– OS, languages, theory, AI, multimedia, logic
– Yet traditional focus on real-world apps
5
About the course - Workload
• Projects this semester cover:
– DBMS Internals
(requires systems programming in “C”)
– Database Query design, optimization and processing
– Database Applications
• Other homework assignments and/or quizes
• Exams – 1 Midterm & 1 Final
• Projects to be done in groups of 3
• http://www.cs.cmu.edu/~natassa/15-415
6
About the Course - Administrivia
• Textbook
– Ramakrishnan and Gehrke, 3rd Edition
• Announcements: academic.cs.15-415.announce
– Only Prof. Posts course announcements in this newsgroup
A 15-415 Infomercial
• A “free tasting” of things to come in this class:
– data modeling
– Query languages
– file systems & DBMSs
– concurrent, fault-tolerant data management
– DBMS architecture
• Next Time
– Database Design using the Entity-Relationship
model
7
OS Support for Data Management
8
Describing Data: Data Models
• A data model is a collection of concepts
for describing data.
9
Example: University Database
• Conceptual schema:
– Students(sid: string, name: string, login:
string, age: integer, gpa:real)
– Courses(cid: string, cname:string,
credits:integer)
– Enrolled(sid:string, cid:string,
grade:string)
• Physical schema:
– Relations stored as unordered files.
– Index on first column of Students.
• External Schema (View):
– Course_info(cid:string,enrollment:integer)
Data Independence
• Applications insulated from how data is
structured and stored.
• Logical data independence: Protection
from changes in logical structure of data.
• Physical data independence: Protection
from changes in physical structure of data.
• Q: Why is this particularly important for
DBMS?
10
Concurrency Control
• Concurrent execution of user programs:
key to good DBMS performance.
– Disk accesses frequent, pretty slow
– Keep the CPU working on several programs
concurrently.
• Interleaving actions of different programs:
trouble!
– e.g., deposit & withdrawal on same account
• DBMS ensures such problems don’t arise:
users can pretend they are using a single-
user system. (called “Isolation”)
– Thank goodness!
Transaction: An Execution of a DB
Program
• Key concept is a transaction: an atomic
sequence of database actions (reads/writes).
• Each transaction, executed completely, must
take the DB between consistent states.
• Users can specify simple integrity constraints
on the data. The DBMS enforces these.
– Beyond this, the DBMS does not understand the
semantics of the data.
– Ensuring that a single transaction (run alone)
preserves consistency is ultimately the user’s
responsibility!
11
Scheduling Concurrent Transactions
• DBMS ensures that execution of {T1, ... , Tn} is
equivalent to some serial execution T1’ ... Tn’.
– Before reading/writing an object, a transaction requests
a lock on the object, and waits till the DBMS gives it the
lock. All locks are held until the end of the transaction.
(Strict 2PL locking protocol.)
– Idea: If an action of Ti (say, writing X) affects Tj (which
perhaps reads X), one of them, say Ti, will obtain the
lock on X first and Tj is forced to wait until Ti completes;
this effectively orders the transactions.
– What if Tj already has a lock on Y and Ti later requests a
lock on Y? (Deadlock!) Ti or Tj is aborted and restarted!
12
The Log
These layers
Structure of a DBMS must consider
concurrency
• A typical DBMS has a control and
recovery
layered architecture.
Query Optimization
• The figure does not and Execution
show the concurrency
Relational Operators
control and recovery
components. Files and Access Methods
13
Components of a DBMS
transaction Data Definition
query
Buffer Manager
LOCK TABLE
Storage Manager
BUFFERS BUFFER POOL
14
Advantages of a DBMS
• Data independence
• Efficient data access
• Data integrity & security
• Data administration
• Concurrent access, crash recovery
• Reduced application development time
• So why not use them always?
– Expensive/complicated to set up & maintain
– This cost & complexity must be offset by need
– General-purpose, not suited for special-purpose
tasks (e.g. text search!)
15
Summary (part 1)
• DBMS used to maintain, query large datasets.
– can manipulate data and exploit semantics
• Other benefits include:
– recovery from system crashes,
– concurrent access,
– quick application development,
– data integrity and security.
• Levels of abstraction provide data independence.
• In this course we will explore:
1) How to be a sophisticated user of DBMS technology
2) What goes on inside the DBMS
Summary, cont.
16