Chapter 1: Introduction
Outline
•   Definition of DBMS
•   Goals of DBMS
•   The Need for Databases
•   Data Models
•   Relational Databases
•   Database Design
•   Storage Manager
•   Query Processing
•   Transaction Manager
• Data
• Data can be facts related to any object
  inconsideration. For example your name, age,
  height, weight, etc are related to you. A picture,
  image, file, PDF etc can also be considered data.
• Database
• A database is a systematic collection of data
  that is organized so that it can easily be
  accessed, managed, and updated.
• DBMS
• A database-management system (DBMS) is a
  collection of interrelated data and a set of
  programs to access those data.
           Goals of a database
• The primary goal of a DBMS is to provide a way
  to store and retrieve database information that is
  both convenient and efficient.
• The database system must ensure the safety of
  the information stored, despite system crashes or
  attempts at unauthorized access.
• If data are to be shared among several users, the
  system must avoid possible anomalous results.
   Database System Applications
• Banking: For customer information, accounts, and loans, and
  banking transactions.
• Airlines: For reservations and schedule information.
• Universities: For student information, course registrations, and
  grades
• Credit card transactions: For purchases on credit cards and
  generation of monthly statements.
• Telecommunication: For keeping records of calls made,
  generating monthly bills, maintaining balances on prepaid
  calling cards, and storing information about the
  communication networks.
• Finance: For storing information about holdings, sales, and
  purchases of financial instruments such as stocks and bonds.
• Sales: For customer, product, and purchase information.
• Manufacturing: For management of supply chain and for
  tracking production of items in factories, inventories of items
  in warehouses/stores, and orders for items.
• Human resources: For information about employees, salaries,
  payroll taxes and benefits, and for generation of paychecks.
            University Database Example
• Application program examples
  – Add new students, instructors, and
    courses
  – Register students for courses, and
    generate class rosters
  – Assign grades to students, compute
    grade point averages (GPA) and
    generate transcripts
• In the early days, database
  applications were built directly on top
  of file systems
 Drawbacks of using file systems to store data
• Data redundancy and inconsistency
   – Multiple file formats, duplication of information in
     different files
• Difficulty in accessing data
   – Need to write a new program to carry out each new
     task
• Data isolation
   – Multiple files and formats
• Integrity problems
   – Integrity constraints (e.g., account balance > 0)
     become “buried” in program code rather than being
     stated explicitly
   – Hard to add new constraints or change existing ones
               Drawbacks of using file systems to store data (Cont.)
• Atomicity of updates
   – Failures may leave database in an inconsistent state with
     partial updates carried out
   – Example: Transfer of funds from one account to another
     should either complete or not happen at all
• Concurrent access by multiple users
   – Concurrent access needed for performance
   – Uncontrolled concurrent accesses can lead to
     inconsistencies
       • Example: Two people reading a balance (say 100) and updating
         it by withdrawing money (say 50 each) at the same time
• Security problems
   – Hard to provide user access to some, but not all, data
Database systems offer solutions to all the above
  problems
          Drawbacks of using file systems to store data (Cont.)
• Durability
• In database systems, durability is the ACID
  property which guarantees that transactions
  that have committed will survive permanently.
  For example, if a flight booking reports that a
  seat has successfully been booked, then the
  seat will remain booked even if the system
  crashes. Durability can be achieved by flushing
  the transaction's log records to non-volatile
  storage before acknowledging commitment.
           Levels of Abstraction
• Physical level: describes how a record (e.g.,
  instructor) is stored.
• Logical level: describes data stored in database, and
  the relationships among the data.
     type instructor = record
              ID : string;
              name : string;
              dept_name : string;
              salary : integer;
             end;
• View level: application programs hide details of
  data types. Views can also hide information (such
  as an employee’s salary) for security purposes.
                         View of Data
An architecture for a database system
               Instances and Schemas
• Similar to types and variables in programming languages
• Logical Schema – the overall logical structure of the database
    – Example: The database consists of information about a set of
      customers and accounts in a bank and the relationship between
      them
         Analogous to type information of a variable in a program
• Physical schema–
           schema the overall physical structure of the database
• Instance – the actual content of the database at a particular point
  in time
    – Analogous to the value of a variable
• Physical Data Independence – the ability to modify the physical
  schema without changing the logical schema
    – Applications depend on the logical schema
    – In general, the interfaces between the various levels and
      components should be well defined so that changes in some parts do
      not seriously influence others.
                 Data Models
• A collection of tools for describing
   –   Data
   –   Data relationships
   –   Data semantics
   –   Data constraints
• Relational model
• Entity-Relationship data model (mainly for
  database design)
• Object-based data models (Object-oriented and
  Object-relational)
• Semistructured data model (XML)
• Other older models:
   – Network model
   – Hierarchical model
                   Relational Model
•   All the data is stored in various tables.
•   Example of tabular data in the relational model
                                                      Columns
                                                                Rows
A Sample Relational Database
    Data Definition Language (DDL)
•   Specification notation for defining the database schema
     Example:         create table instructor (
                         ID          char(5),
                         name         varchar(20),
                         dept_name varchar(20),
                         salary       numeric(8,2))
•   DDL compiler generates a set of table templates stored in a   data
    dictionary
•   Data dictionary contains metadata (i.e., data about data)
     – Database schema
     – Integrity constraints
           • Primary key (ID uniquely identifies instructors)
     – Authorization
           • Who can access what
        Data Manipulation Language (DML)
• Language for accessing and manipulating
  the data organized by the appropriate
  data model
  – DML also known as query language
• Two classes of languages
  – Pure – used for proving properties about
    computational power and for optimization
     • Relational Algebra
     • Tuple relational calculus
     • Domain relational calculus
  – Commercial – used in commercial systems
     • SQL is the most widely used commercial
       language
                                SQL
• The most widely used commercial language
• SQL is NOT a Turing machine equivalent
  language
• To be able to compute complex functions SQL
  is usually embedded in some higher-level
  language
• Application programs generally access
  databases through one of
  – Language extensions to allow embedded SQL
  – Application program interface (e.g., ODBC/JDBC)
    which allow SQL queries to be sent to a database
                       Database Design
The process of designing the general structure of the database:
               • Logical Design – Deciding on the database schema.
                 Database design requires that we find a “good”
                 collection of relation schemas.
                   – Business decision – What attributes should we record
                     in the database?
                   – Computer Science decision – What relation schemas
                     should we have and how should the attributes be
                     distributed among the various relation schemas?
               • Physical Design – Deciding on the physical layout of
                 the database
Application Architectures
      Database Design (Cont.)
• Is there any problem with this relation?
                        Design Approaches
• Need to come up with a methodology to
  ensure that each of the relations in the
  database is “good”
• Two ways of doing so:
  – Entity Relationship Model (Chapter 7)
     • Models an enterprise as a collection of entities and
       relationships
     • Represented diagrammatically by an entity-
       relationship diagram:
  – Normalization Theory (Chapter 8)
     • Formalize what designs are bad, and test for them
  Object-Relational Data Models
• Relational model: flat, “atomic” values
• Object Relational Data Models
  – Extend the relational data model by including
    object orientation and constructs to deal with
    added data types.
  – Allow attributes of tuples to have complex types,
    including non-atomic values such as nested
    relations.
  – Preserve relational foundations, in particular the
    declarative access to data, while extending
    modeling power.
  – Provide upward compatibility with existing
    relational languages.
                  Database Engine
• Storage manager
• Query processing
• Transaction manager
            Storage Management
• Storage manager is a program module that
  provides the interface between the low-level data
  stored in the database and the application
  programs and queries submitted to the system.
• The storage manager is responsible to the
  following tasks:
   – Interaction with the OS file manager
   – Efficient storing, retrieving and updating of data
• Issues:
   – Storage access
   – File organization
   – Indexing and hashing
                         Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation
                      Query Processing (Cont.)
• Alternative ways of evaluating a given query
  – Equivalent expressions
  – Different algorithms for each operation
• Cost difference between a good and a bad
  way of evaluating a query can be enormous
• Need to estimate the cost of operations
  – Depends critically on statistical information
    about relations which the database must
    maintain
  – Need to estimate statistics for intermediate
    results to compute cost of complex expressions
          Transaction Management
• What if the system fails?
• What if more than one user is
  concurrently updating the same data?
• A transaction is a collection of
  operations that performs a single
  logical function in a database
  application.
           Transaction Management
• Transaction-management component
  ensures that the database remains in a
  consistent (correct) state despite system
  failures (e.g., power failures and
  operating system crashes) and
  transaction failures.
• Concurrency-control manager controls
  the interaction among the concurrent
  transactions, to ensure the consistency
  of the database.
                       Database Users
Naive users are unsophisticated users who interact with
the system by invoking one of the application programs
that have been written previously.
Application programmers are computer professionals
who write application programs. Application programmers
can choose from many tools to develop user interfaces.
Sophisticated users interact with the system without
writing programs. Instead, they form their requests in a
database query language
• Specialized users are sophisticated users who write
specialized database applications that do not fit into the
traditional data-processing framework.
                  Database Administrator
One of the main reasons for using DBMSs is to have central
control of both the data and the programs that access those
data. A person who has such central control over the system
is called a database administrator (DBA).
The functions of a DBA include:
• Schema definition. The DBA creates the original database
schema by executing a set of data definition statements in
the DDL.
• Storage structure and access-method definition.
• Schema and physical-organization modification. The
DBA carries out changes to the schema and physical
organization to reflect the changing needs of the
organization, or to alter the physical organization to improve
performance.
             Database Administrator.
• Granting  of authorization for data access. By granting
different types of authorization, the database administrator
can regulate which parts of the database various users can
access.
• Routine maintenance. Examples of the database
administrator’s routine maintenance activities are:
Periodically backing up the database, either onto tapes or
onto remote servers, to prevent loss of data in case of
disasters such as flooding.
Ensuring that enough free disk space is available for
normal operations, and upgrading disk space as required.
Monitoring jobs running on the database and ensuring that
performances not degraded by very expensive tasks
submitted by some users
        Database Architecture
The architecture of a database systems is greatly influenced by
 the underlying computer system on which the database is
   running:
• Centralized
• Client-server
• Parallel (multi-processor)
• Distributed
  History of Database Systems
• 1950s and early 1960s:
  – Data processing using magnetic tapes for storage
     • Tapes provided only sequential access
  – Punched cards for input
• Late 1960s and 1970s:
  – Hard disks allowed direct access to data
  – Network and hierarchical data models in widespread use
  – Ted Codd defines the relational data model
     • Would win the ACM Turing Award for this work
     • IBM Research begins System R prototype
     • UC Berkeley begins Ingres prototype
  – High-performance (for the era) transaction processing
                  History (cont.)
• 1980s:
  – Research relational prototypes evolve into commercial systems
     • SQL becomes industrial standard
  – Parallel and distributed database systems
  – Object-oriented database systems
• 1990s:
  – Large decision support and data-mining applications
  – Large multi-terabyte data warehouses
  – Emergence of Web commerce
• Early 2000s:
  – XML and XQuery standards
  – Automated database administration
• Later 2000s:
  – Giant data storage systems
     • Google BigTable, Yahoo PNuts, Amazon, ..
End of Chapter 1