Lect 1
Lect 1
Fall 2017
1.1
About the course – Administrivia
Instructor:
George Kollios, [email protected]
MCS 283, Mon 2:30-4:00 PM and Tue 1:00-2:30 PM
Teaching Fellows:
Mona Jalal, [email protected]
EMA 309, Tue/Thu 2:00-3:15 PM and Fri 10:15-11:45 AM
Baichuan Zhou, [email protected]
EMA 309, Wed 2:30-4:30 PM and Thu 2:30-4:30PM
Home Page:
http://www.cs.bu.edu/fac/gkollios/cs460f17
Check frequently! Syllabus, schedule, assignments,
announcements…
Piazza site (you will be added soon)
1.2
Textbook
1.3
Grading
CS460
Homeworks: 25%
Midterm: 20%
Final: 30%
Programming Assignments: 25%
examples:
Implement a Web application using a DBMS
Use a NoSQL system to analyze large datasets
(tentative) Use Amazon Cloud Services to perform data analysis on a
large dataset
1.4
Grading
CS660
Homeworks: 20%
Midterm: 20%
Final: 25%
Programming Assignments: 25%
Extra Assignments: 10%
1.5
What is a Database?
Database:
A very large collection (of files) of related data
1.6
What is a Data Base Management
System?
Data Base Management System (DBMS):
A software package/system that can be used to
store, manage and retrieve data from databases that
persist for long periods of time!
Examples: Oracle, IBM DB2, MS SQLServer, MySQL,
PostgreSQL, SQLite,…
1.7
Why Study Databases??
1.8
Why Databases??
Why not store everything on flat files: use the file system of
the OS, cheap/simple…
1.9
Problem 1
Data Organization
redundancy and inconsistency
Multiple file formats, duplication of information in different files
Name, Course, Email, Grade
John Smith, [email protected], CS112, B
Mike Stonebraker, [email protected], CS234, A
Jim Gray, CS560, [email protected], A
John Smith, CS560, [email protected], B+
1.10
Problem 2
Data retrieval:
Find the students registered for CS460
Find the students with GPA > 3.5
1.11
Problem 3
Data Integrity
1.12
Data Organization
1.13
View of Data
1.14
Database Schema
1.15
Data Organization
1.16
Relational Model
Attributes
Example of tabular data in the relational model
192-83-7465 Johnson
Alma Palo Alto A-101
019-28-3746 Smith
North Rye A-215
192-83-7465 Johnson
Alma Palo Alto A-201
321-12-3123 Jones
Main Harrison A-217
019-28-3746 Smith
North Rye A-201
1.17
Data Organization
Data Storage
Where can data be stored?
Main memory
Secondary memory (hard disks)
Optical storage (DVDs)
Tertiary store (tapes)
Move data? Determined by buffer manager
Mapping data to files? Determined by file manager
1.18
Data retrieval
Queries
Query = Declarative data retrieval
describes what data, not how to retrieve it
Ex. Give me the students with GPA > 3.5 vs
Scan the student file and retrieve the records with gpa>3.5
Why?
1. Easier to write
2. Efficient to execute (why?)
1.19
SQL
1.20
Data retrieval:
Indexing
How to answer fast the query: “Find the student with SID = 101”?
One approach is to scan the student table, check every student, retrurn
the one with id=101… very slow for large databases
1st keep student record over the SID. Do a binary search…. Updates…
2nd Use a dynamic search tree!! Allow insertions, deletions, updates and at the
same time keep the records sorted! In databases we use the B+-tree (multiway
search tree)
3rd Use a hash table. Much faster for exact match queries… but cannot support
Range queries. (Also, special hashing schemes are needed for dynamic data)
1.21
3
5
11
30
30
35
100
100
101
B+Tree Example
110
120
1.22
Root
120
130
150
156 150
179
180
180
B=4
200
Data Integrity
Transaction processing
John: Jane:
1. get balance 1. get balance
2. if balance > $50 2. if balance > $100
3. balance = balance - $50 3. balance = balance - $100
4. update balance 4. update balance
1.23
Data Integrity
Recovery
Recovery management
1.24
Database Architecture
DB Programmer
User DBA
Code w/ embedded queries
Query DDL Commands
1.25
Big Data and NoSQL
1.26
Outline
1.27