What is a Database?
1. A databaseis an organized collection of related data.
2. It is stored and accessed electronicallyusing computers.
3. It represents a part of the real worldand reflects its changes.
4. It allows users to store, retrieve, manage, and manipulate data efficiently.
5. It is designed for a specific purposewith intended users and applications.
Advantages of Using the DBMS Approach
1. Controlling Redundancy and Inconsistency
Reduces duplicate data using normalization
Ensures consistency across data
Allows controlled redundancy for performance
Automatically maintains consistency during updates
Restricting Unauthorized Access
Provides security and authorization features
Only authorized users can access or modify specific data
DBA controls user accounts and access levels
Efficient Query Processing
Uses indexes and buffers to speed up data access
Optimizes query execution plans for better performance
Backup and Recovery
Restores database to previous state after failure
Protects data from hardware or system crashes
Multiple User Interfaces
Offers interfaces like query languages, forms, menus, and natural language
Suitable for different types of users (casual, technical, parametric)
Enforcing Integrity Constraints
Ensures valid and accurate data through rules
Includes data type checks, key constraints, and referential integrity
Other Benefits
Faster application development
Flexibility in modifying the database
Enforces organization-wide data standards
Provides up-to-date information
Reduces overall cost through shared resources
Main Characteristics of the Database Approach
Self-Describing Nature of a Database System
A database includes not only the data but also metadata(description of data structure, types,
constraints).
Metadata is stored in a system catalog, used by the DBMS and users.
This makes the system self-describing, unlike traditional file systems.
Insulation Between Programs and Data, and Data Abstraction
Program-Data Independence: Data structure is stored separately from application programs,
so structure changes dont require program changes.
Program-Operation Independence: Operations (functions/methods) on data can be modified
without affecting program interfaces.
Data Abstraction: Users see a logical view, hiding physical storage details.
Support of Multiple Views of the Data
Different users can have customized viewsof the database.
A view may be a subsetor virtualdata derived from actual tables.
Enhances security, simplicity, and usabilityfor different users.
Sharing of Data and Multiuser Transaction Processing
Multiple users can access and update the database concurrently.
Concurrency controlensures correct and safe updates.
Supports transactionswith properties like:
Atomicity(all or none),
Isolation(independent execution),
Consistency, and
Durability(permanent changes).
Actors on the Scene (3 different questions)
Database Administrator (DBA):
The DBA is responsible for managing the entire database system. Key duties include:
Authorizing user access and ensuring data security
Monitoring database usage and performance
Handling upgrades by acquiring hardware/software
Solving issues like security breaches or slow response times
In large organizations, a DBA may have support staff.
Database Designers:
Database designers decide what data should be stored and how to organize it. Their tasks include:
Interacting with users to understand data requirements
Creating database views for different user groups
Designing the final database structure to meet all user needs
End Users in DBMS
End users are individuals who interact with the database system to retrieve, modify, or generate
reports based on the stored data. They access the database through applications or directly via query
languages.
There are four main categoriesof end users:
1. Casual End Users
Use the database occasionally.
Require different informationeach time.
Use ad-hoc query languages(like SQL) to access data.
Typically middle- or high-level managers.
2. Naive or Parametric End Users
Largest groupof end users.
Use standard, pre-defined (canned) transactions.
Need no knowledge of query languages.
Examples:
Bank tellers: check balances, post deposits/withdrawals.
Reservation clerks: check availability and book tickets/hotel rooms.
3. Sophisticated End Users
Have deep understandingof DBMS tools and features.
Use advanced database toolsto develop complex applications.
Often include engineers, scientists, and business analysts.
4. Stand-Alone Users
Use ready-made software packageswith user-friendly interfaces.
Maintain personal databasesfor specific purposes.
Example:
A user of a tax software packagemanaging personal financial data.
Three-Schema Architecture
Definition:
The Three-Schema Architectureis a framework used in database systems to separate the users
viewof the database from the physical storageof data. It defines three levels of abstraction: internal,
conceptual, and external, and helps achieve data independenceand efficient database management.
Levels of Three-Schema Architecture:
1. Internal Level (Physical Schema)
Describes how data is physically storedin the database.
Uses physical data model.
Deals with storage details: file formats, indexing, compression, etc.
Optimized for performance and efficiency.
Hidden from users.
2. Conceptual Level (Logical Schema)
Describes the logical structureof the entire database.
Uses representational data model(e.g., relational model).
Contains entities, relationships, data types, and constraints.
Hides physical storage details from users.
Represents the community viewof the database.
3. External Level (View Schema)
Provides different viewsof the database to different users.
Each view includes only relevant datafor that user or group.
Uses external schema, possibly designed in a high-level data model.
Hides both logical and physical details not needed by the user.
Helps ensure security and simplicity.
Mappings:
Mappingsconnect the three levels and handle the transformation of data and queries between
them:
External ↔ Conceptual
Conceptual ↔ Internal
Enable data independence:
Logical data independence: Changes in conceptual schema dont affect user views.
Physical data independence: Changes in physical storage dont affect conceptual schema.
Data Independence:
Data Independenceis the ability to modify the schema at one levelof a database system without
affecting the schema at the next higher level.
It ensures that changes in storage or logical structure do not require changes in the application
programsor user views.
Types of Data Independence:
1. Logical Data Independence
Definition:
The capacity to change the conceptual schemawithout requiring changes to the external schemasor
application programs.
Key Points:
Allows modifications like:
Adding/removing entities or attributes
Changing relationships or constraints
Application programs remain unaffected
Only view definitions and mappings may need updating
Supported by good DBMS architectures
2. Physical Data Independence
Definition:
The capacity to change the internal schemawithout needing to change the conceptual schemaor
external schemas.
Key Points:
Allows changes in:
File organization
Indexing methods
Storage devices
Improves performance and storage efficiency
Users and programs dont see any changes
DBMS Component Modules and Their Interaction
The DBMS consists of various componentsthat interact with each other to manage and process data
efficiently. These components are grouped into two levels:
1. External/User Level (Top Part of Architecture)
This part includes interfaces for different types of users:
Casual Usersuse the Interactive Query Interfaceto write ad-hoc queries.
Application Programmersuse programming languages with embedded DML (Data
Manipulation Language).
DBA and System Analystsinteract through DDL (Data Definition Language).
2. Internal System Level (Lower Part of Architecture)
This includes modules responsible for processing queries, managing data, and ensuring transaction
safety:
(a) DDL Compiler
Processes schema definitions written in DDL.
Stores metadata(schema descriptions) in the DBMS catalog.
(b) Query Compiler
Validatesthe query syntax, file names, and data elements.
Converts the query into an internal query form.
(c) Query Optimizer
Improves query efficiency by:
Reordering operations
Removing redundancies
Using indexes and algorithms
Consults the system catalogand generates optimized executable code.
(d) Precompiler
Extracts DML commands from application programs.
Sends DML to the DML compilerfor conversion into database object code.
(e) Host Language Compiler
Compiles the rest of the host program.
Links the compiled DML and host code to form a canned transaction.
(f) Runtime Database Processor
Executes:
Privileged commands
Optimized query plans
Canned transactions with parameters
Works with:
System catalog(updates statistics)
Stored data managerfor actual data access
Main memory buffersfor managing data transfer
Concurrency controland backup &recovery systemsfor transaction safety
(g) Stored Data Manager
Uses operating system I/O servicesfor reading/writingdata between disk and memory.
Handles low-level data accessoperations.
Interaction Flow Summary:
1. User query → Query Interface
2. Parsed by Query Compiler → Optimized by Query Optimizer
3. Executed by Runtime Processor using Stored Data Manager
4. All modules access and update System Catalog
Attributes in ER Model
Attributes are properties or characteristicsthat describe an entity.
For example, the entity EMPLOYEEmay have attributes like Name, SSN, Address, and Age.
Types of Attributes
1. Composite vs. Simple (Atomic) Attributes
Composite Attributes
These can be divided into smaller subparts, each with independent meaning.
Example:
Address→ Street_address, City, State, Zip
Street_address→ Number, Street, Apartment_number
This forms a hierarchy of attributes.
The value of a composite attribute is a combinationof its sub-attributes.
Simple (Atomic) Attributes
These cannot be divided further.
Example: SSN, Age, Employee_ID
2. Single-Valued vs. Multivalued Attributes
Single-Valued Attributes
These have only one valuefor a particular entity.
Example: Ageof a person.
Multivalued Attributes
These can have multiple valuesfor a single entity.
Example:
Colorsfor a car → can be 1 to 3 colors
College_degreesof a person
Represented with curly braces {}in ER diagrams.
3. Stored vs. Derived Attributes
Stored Attributes
These are values directly storedin the database.
Example: Birth_Dateof an employee.
Derived Attributes
These are calculatedfrom stored attributes.
Example:
Agecan be derived from Birth_Date
Total_Salaryderived from basic + allowances
4. Null Value Attribute (Optional Attributes)
In some cases, an entity may not have a valuefor an attribute.
Example:
Apartment_number→ not applicable for single-family homes
College_degrees→ not applicable if a person has no degree
In such cases, the attribute is assigned a special NULL value.
NULL can also mean:
Value is unknown
Value is not applicable
5. Complex Attributes
These are attributes that combine both compositeand multivaluedcharacteristics.
Cardinality Ratio
The cardinality ratiodefines the maximum number of relationship instancesthat an entity can
participate in with another entity.
Possible cardinality ratios:
1:1– One entity in A relates to one entity in B (e.g., each department has one manager, and
each manager manages one department).
1:N– One entity in A relates to many in B (e.g., one department has many employees, but each
employee belongs to one department).
N:1– Many in A relate to one in B (reverse of 1:N).
M:N– Many in A relate to many in B (e.g., employees can work on multiple projects, and
projects can have multiple employees).
In ER diagrams, these are represented with numbers (1, N, M) on the lines connected to the
relationship diamond.
Participation Constraint (Existence Dependency)
The participation constraintdefines the minimum numberof relationship instances an entity must
participate in. It shows whether the existenceof an entity depends on being related to another.
Types of Participation:
Total Participation:
Every entity mustbe involved in the relationship.
Example: Every employee must work for a department.
Shown with a double linein ER diagrams.
Partial Participation:
Some entities may or may notparticipate.
Example: Not all employees manage a department.
Shown with a single linein ER diagrams.
Together:
Cardinality Ratio + Participation Constraint = Structural Constraints
These help accurately model rules from the real-world system in an ER diagram.
1. Data Dictionary
A data dictionaryis like a catalog or reference bookfor a database.
It stores information about the data, such as:
Table names
Column names and data types
Relationships
Constraints (e.g., primary key)
Who can access the data
Think of it as the "map of the database"that helps users and DBMS understand whats inside.
2. Weak Entity
A weak entityis an entity that cannot be identified on its own. It depends on another entity (called the
owner) to be uniquely identified.
It doesn't have a primary keyof its own.
It always has a relationship with a strong entity.
It has total participationin the relationship.
It uses a partial key(like a name) plus the owners key to identify each record.