01-intro
01-intro
to Database Systems
Joe Hellerstein
and Christopher Olston
Fall 2005
Queries for Today
• What?
• Why?
• Who?
• How?
• For instance?
What: Database Systems
Then
What: Database Systems
Today
What: Database Systems
Today
What: Database Systems
Today
What: Database Systems
Today
So… What is a Database?
• If it isn’t
“published”,
it can’t be
searched!
What: A “Database Query”
Approach
“Yahoo Actors” JOIN “FECInfo”
(Courtesy of the Telegraph research group
@Berkeley)
Q: Did it Work?
What: Is a File System a
DBMS?
• Thought Experiment 1:
– You and your project partner are editing the
same file.
– You both save it at the same time.
– Whose changes survive?
A) Yours B) Partner’s C) Both D) Neither E) ???
• Thought Experiment 2:
–You’re updating a file.
–The power goes out.
–Which changes survive?
A) All B) None C) All Since Last Save D) ???
What: Is a File System a
DBMS?
• Thought Experiment 1:
– You and your project partner are editing the
Q:same
How file.do you write
programs over
– You both save a same time.
it at the
– Whose changes survive?
subsystem when it
A) Yours B) Partner’s C) Both D) Neither E) ???
promises you only “???” ?
• Thought Experiment 2:
A: Very,
–You’re very
updating a file.carefully!!
–The power goes out.
–Which changes survive?
A) All B) None C) All Since Last Save D) ???
OS Support for Data
Management
• Data can be stored in RAM
– this is what every programming
language offers!
– RAM is fast, and random access
– Isn’t this heaven?
• Every OS includes a File System
– manages files on a magnetic disk
– allows open, read, seek, close on a
file
– allows protections to be set on a file
– drawbacks relative to RAM?
Database Management
Systems
• “Knowledge is power.” --
Sir Francis Bacon
• representing information
– data modeling
• languages and systems for querying data
– complex queries & query semantics*
– over massive data sets
• concurrency control for data manipulation
– controlling concurrent access
– ensuring transactional semantics
• reliable data storage
– maintain data semantics even if you pull the plug
• We will see
– Algorithms and cost analyses
– System architecture and
implementation
– Resource management and scheduling
– Computer language design, semantics
and optimization
– Applications of AI topics including logic
and planning
– Statistical modeling of data
Why take this class?
E. It isn’t that much work.
• Instructors
– Prof. Joe Hellerstein, UC Berkeley
– Dr. Christopher Olston, Yahoo!
Research
– [email protected]
• TAs
– John Lo
– Nathan Burkhart
– Alex Rasmussen
How? Workload
• http://inst.eecs.berkeley.edu/~cs186
• Prof. Office Hours:
– Hellerstein: 685 Soda Hall, TBA (check web
page)
– Olston: 687 Soda Hall, Thursday 2PM
• TAs
– Office Hours: TBA (check web page)
• Textbook
– Ramakrishnan and Gehrke, 3rd Edition
• Grading, hand-in policies, etc. will be on Web Page
• Cheating policy: zero tolerance
– We have the technology…
• Team Projects
– Teams of 2
– Peer evaluations.
• Be honest! Feedback is important. Trend is more important
than individual project.
• Class bulletin board - ucb.class.cs186
– read it regularly and post questions/comments.
– mail broadcast to all TAs will not be answered
– mail to the cs186 course account will not be answered
• Class Blog for announcements
Agenda for the rest of today
• A “free tasting” of central concepts in DB
field:
– queries (vs. search)
– data independence
– transactions
• Next Time
– the Relational data model
• Why?
transaction
consistent state 1 consistent state 2
Example
Suppose:
1. T1 locks Savings
2. T2 locks Checking
• Now neither transaction can proceed!
– called “deadlock”
– DBMS will abort and restart one of T1 and T2
– Need “undo” mechanism that preserves consistency
• DBMS ensures:
– atomicity even if xact aborted (due to deadlock, system
crash, …)
– durability of committed xacts, even if system crashes.
Query Optimization
and Execution
Relational Operators
DB
Advantages of a DBMS
• Data independence
• Efficient data access
• Data integrity & security
• Data administration
• Concurrent access, crash recovery
• Reduced application development time
• So why not use them always?
– Expensive/complicated to set up & maintain
– This cost & complexity must be offset by need
– General-purpose, not suited for special-purpose tasks (e.g.
text search!)
Databases make these folks
happy ...
• DBMS vendors, programmers
– Oracle, IBM, MS …
• End users in many fields
– Business, education, science, …
• DB application programmers
– Build data entry & analysis tools on top of DBMSs
– Build web services that run off DBMSs
• Database administrators (DBAs)
– Design logical/physical schemas
– Handle security and authorization
– Data availability, crash recovery
– Database tuning as needs evolve