CS 221 Lecture Manual
CS 221 Lecture Manual
WEEK 1
File-Based System
collection of application programs that perform services for the end-users such as the
production of reports
an early attempt to computerize the manual filing system
was developed in response to the needs of industry for more efficient data access
Database – a shared collection of logically related data, and description of this data, designed to meet
the information needs of an organization
Database Management System (DBMS) – a software system that enables users to define, create,
maintain, and control access to the database
The structure of data files is stored in the DBMS catalog separately from the access
programs (program independence). In DBMS environment, in adding another piece of
data, we just need to change the description of data in meta-data, no programs are
changed.
Data abstraction – characteristic that allows program-data independence and program
operation independence
3. Support of multiple views of the data
A database typically has many users, each of whom may require different perspective or
view of the dbase.
View – may be a subset of dbase
4. Sharing of data and multiuser transaction processing
Must allow multiple users to access the dbase at the same time
Concurrency control – the DBMS must include software to ensure that several users
trying to update the same data do so in a controlled manner so that the result of the
updates is correct
On-Line Transaction Processing (OLTP) – DBMS should ensure that each subset or
record can be accessed by only one user at a time
Categories of End-Users
1. Casual end user – occasionally access the dbase, but they need different information each
time
typically middle- or high-level managers or other occasional browsers
2. Naïve or parametric end users – make up a sizeable portion of dbase end user
their main job function revolves around constantly querying and updating the dbase
example: bank tellers, reservation clerks for airlines, hotels and car rentals
3. Sophisticated end users – include engineers, scientists, business analysts, and others
who thoroughly familiarize themselves with the facilities of DBMS
4. Stand-alone users – maintain personal dbases by using ready-made program packages
that provide easy-to-use menu- or graphics-based interfaces
1. System analysts – determine the requirements of end-users and develop specifications for
canned transactions (using standard types of queries and updates)
2. Application programmers – implement these specifications as programs, then they test,
debug, document and maintain the canned transactions
Disadvantages of DBMS
1. Complexity
2. Size
3. Cost of DBMS
4. Additional hardware costs
5. Cost of conversion
6. Performance
7. Higher impact failure
5. Economies of Scale
DBMS approach permits consolidation of data and applications, thus reducing the
amount of wasteful overlap between activities of data-processing personnel in different
projects or departments.
WEEK 2
1. External Level – The user’s view of database. This level describes that part of the dbase that
is relevant to each user.
2. Conceptual level – The community view of the dbase. This level describes what data is stored
in the dbase and the relationships among data.
3. Internal level – The physical representation of the dbase on the computer. This level describes
how the data is stored in the dbase.
4. Physical level – may be managed by operating system under the direction of the DBMS.
– DBMS consists of items only the operating system knows, such as exactly how the
sequencing is implemented and whether the fields of internal records are stored as
contiguous bytes on the disk.
–
Dbase Schema – overall description of the dbase
Database Languages
2 Parts
2. Data Manipulation Language (DML) – is used both read and update the database
a languages that provides a set of operations to support the basic data manipulation
operations on the data held in the database
Example:
Employee
EmpNum EmpName DeptNum Salary
2001 C. Cristobal 001 10000
2002 C. Malay 002 13000
2003 O. Ternida 001 15000
2004 N. Felicerta 002 11000
2005 E. Smith 003 12000
2006 R. Distor 003 13500
Department
DeptNum DeptName Budget
001 Courseware 40000
002 MIS 60000
003 Engineering 70000
Network data model - the records are organized as generalized graph structures
with records appearing as nodes and sets as edges in the graph.
Example:
Hierarchical data model - data is represented as collection of records and relationships are
represented by sets, however it allows a node (record) to have only one parent
Files are arranged in a top-down structure that resembles a tree or
genealogy chart. It is a restricted type of network model.
Functions of DBMS
1. Data storage, retrieval, and update
A DBMS must furnish users with the ability to store, retrieve, and update data in the
database.
2. A user-accessible catalog
A DBMS must furnish a catalog in which descriptions of data items are stored and which is
accessible to users.
3. Transaction support
A DBMS must furnish a mechanism which will ensure either that all the updates
corresponding to a given transaction are made or that none of them is made.
4. Concurrency control services
A DBMS must furnish mechanism to ensure that the database is updated correctly when
multiple users are updating the database concurrently.
5. Recovery services
A DBMS must furnish a mechanism for recovering the database in the event that the
database is damaged in any way.
6. Authorization services
A DBMS must furnish a mechanism to ensure that only authorized users can access the
database.
7. Support for the communication
DBMS must be capable of integrating with communication software.
8. Integrity services
DBMS must furnish a means to ensure that both data in the database and changes to the
data follow certain rules.
9. Services to promote data independence.
A DBMS must include facilities to support the independence of programs from the actual
structure of the database.
10. Utility services
A DBMS should provide a set of utility services.
COMPONENTS OF DBMS
Relational Model – is based on the mathematical concept of a relation, which is physically represented
as a table
Attributes – a named column of a relation
Branch
BranchNo Street City Postcode
B001 22 Deer Rd. London SW14EH
B002 16 Argyll Aberdeen AB23SU
B003 163 Main St. Glasglow GI19QX
B004 32 Manse Rd. Bristol BS99NZ
Foreign Key
Staff
StaffNo fName lName Position Gender DOB Salary BranchNo
SL21 John White Manager M 1-Oct-75 30000 B001
SG37 Ann Breech Assistant F 10-Nov-60 12000 B003
SG14 David Ford Supevisor M 24-Mar-58 18000 B003
SG09 Mary Howe Assistant F 19-Feb-70 9000 B004
Relational database
a collection of normalized relations with distinct relation names
consists of relations that are appropriately structured
Alternative Terminology
Formal Terms Alternative 1 Alternative 2
Relation Table File
Tuple Row Record
Attribute Column Field
Database Relations
Relation schema – a named relation defined by a set of attribute and domain name pairs
Properties of Relations
the relation has a name that is distinct from all other relation names
each cell of relation contains exactly one atomic (single) value
each attribute has a distinct name
the values of an attribute are all from the same domain
each tuple is distinct, there are no duplicate tuples
the order of attributes has no significance
the order of tuples has no significance
Relational Keys
Relational Keys
Superkey – an attribute, or set of attributes, that uniquely identifies a tuple within a relation
Candidate key – a superkey such that no proper subset is a superkey within the relation
A candidate key for a relation has two properties:
1. uniqueness – in each tuple of relation, the values of candidate key are uniquely identify
that tuple
2. irreducibility – no proper subset of candidate key has the uniqueness property
Primary key – the candidate key that is selected to identify tuples uniquely with the relation
Alternate key – candidate keys that are not selected to be the primary key
Foreign key – an attribute, or set of attributes with one relation that matches the candidate key of some
possibly the same relation
Relational Integrity
Null
Represents a value for an attribute that is currently unknown or is not applicable for this tuple.
Deals with incomplete or exceptional data.
Null represents the absence of a value and is not the same as zero or spaces, which are values.
Enterprise Constraints
Additional rules specified by users or database administrators.
Eg., the maximum number of staff in a branch.
Views
Base Relation – A named relation, corresponding to an entity in conceptual schema, whose
tuples are physically stored in database.
View – Dynamic result of one or more relational operations operating on the base relations to
produce another relation.
o A view is a virtual relation that does not actually exist in the database but is produced
upon request, at time of request. Contents of a view are defined as a query on one or
more base relations.
WEEK 5
Requirements
Collection and
analysis
Database Design
Conceptual
database design
DBMS Application
SELECTION
Design
Logical database
design
Physical database
design
Prototyping Implementation
(optional)
Data conversion
and loading
Testing
Operational
maintenance
Database planning – planning how the stages of the life cycle can be realized most efficiently and
effectively. The management activities that allow the stages of the database application to be realized
as efficiently and effectively as possible.
System definition – specifying the scope and boundaries of the database application, its users, and
application areas. It describes the scope and boundaries of the database application and major user
views.
Requirements collection and analysis – collection and analysis of the requirements of users and
application areas.
Database design – The process of creating a design for a database that will support the enterprise’s
operations and objectives. It includes the conceptual, logical, and physical design of the database.
Conceptual Design – Data modeling is used to create an abstract database structure
that represents real-world objects in the most realistic way possible. The conceptual
model must embody a clear understanding of the business and its functional areas. At
this level of abstraction, the type of hardware and/ or database model to be used might
not yet identified. Therefore, the design must be software- and hardware-independent
so the system can be set up within any hardware or software platform chosen later.
Logical Design – Translates the conceptual design into the internal model for a selected
database management system (DBMS) such as DB2, SQL Server, Oracle, and Access.
Therefore, the logical design is software-dependent.
Physical Design – It is the process of selecting the data storage and data access
characteristics of the database. The storage characteristics are a function of the types
of devices supported by the hardware, the type of data access methods supported by
the system, and the DBMS. Physical design affects not only the location of the data in
the storage devices, but also the performance of the system.
DBMS selection (optional) – selecting a suitable DBMS for the database application.
Application design – designing the user interface and the application programs that use and process
the database.
Prototyping (optional) – building a working model of the database application, which allows the
designers or users to visualize and evaluate how the final system will look and function.
Implementation – creating the external, conceptual, and internal database definitions and the
application programs
Data conversion and loading – loading data from the old system to the new system
Testing – database application is tested for errors and validated against the requirements specified by
the users.
Operational maintenance – database application is fully implemented. The system is continuously
monitored and maintained. When necessary, new requirements are incorporated into the database
application through the preceding stages of the lifecycle.
User Views
Defines what is required of a database application from the perspective of a particular job role
(such as Manager or Supervisor) or enterprise application area (such as marketing, personnel,
or stock control).
The bottom-up approach begins at the fundamental level of attributes (that is, properties of entities and
relationships), which through analysis of the associations between attributes, are grouped into relations
that represent types of entities and relationships between entities. This approach is appropriate for the
design of simple databases with a relatively small number of attributes.
A more appropriate strategy for the design of complex databases is to use the top-down approach.
This approach starts with the development of data models that contain a few high-level entities and
relationships and then applies successive top-down refinements to identify lower-level entities,
relationships and the associated attributes.
Database design refers to the activities that focus on the design of the database structure that will be
used to store and manage end-user data. A good database – that is, a database that meets all user
requirements – does not just happen; its structure must be designed carefully. In fact, database design
is such a crucial aspect of working with databases. Even a good DBMS will perform poorly with a badly
designed database.
Proper database design requires the database designer to identify precisely the database’s expected
use. Designing a transactional database emphasizes accurate and consistent data and operational
speed.
WEEK 7
Entity-Relationship Modeling
In a database, anything about which information can be stored; for example, a person, concept,
physical object or event. Typically refers to a record structure.
May describe
Entity Sets
Entity Instances
Types of Attributes
Entity Type
Composite Derived
Attribute Attribute
Multivalued
Attribute
Component
attributes
PLACES is a relationship
Types of Relationship
In a company, each division is managed by only one manager and each manager manages only one
division
Managed
Department Manager
by
Among the automobile manufacturing companies, a company manufactures many cars, but a given car
is manufactured in only one company
manufactures
Company Car
In a college, every student takes many courses and every course is taken by many students
Degree of Relationship
is
Person married
to
2. Binary Relationship – relationship between the instances of two entity types. It is the
most common type of relationship encountered in the data modeling.
Product
One instance of supplies might record the fact that vendor X can supply product C to
warehouse Y.
Cardinality Constraints
1. Minimum cardinality – the minimum number of instances of one entity that may be associated
with each instance of another entity
2. Maximum cardinality – the maximum number of instances of one entity that may be associated
with a single occurrence of another entity
Basic Relationship:
has Patient
Patient History
has Patient
Patient History
Relationship Cardinality
1. Mandatory One
2. Mandatory Many
3. Optional One
4. Optional Many
WEEK 9
Normalization
a technique for producing a set of relations with desirable properties, given the data
requirements of an enterprise
a method of organizing data elements in a database into tables
Purpose of Normalization
supports database designers by presenting a series of tests, which can be applied to individual
relations so that a relational schema can be normalized to a specific form to prevent the
possible occurrence of update anomalies
Data redundancy and update anomalies. A major aim of relational database is to group
attributes into relations to minimize data redundancy and thereby reduce file storage space
required by the implemented base relations.
Data Anomalies
A data abnormality that exists when inconsistent changes to a database have been made. For
example, an employee moves, but the address change is corrected only in one file and not
across all files in the database.
1. An update anomaly is a data inconsistency that results from data redundancy and a partial
update. For example, each employee in a company has a department associated with them as
well as the student group they participate in.
If A. Bruchs’ department is an error it must be updated at least 2 times or there will be inconsistent data
in the database. If the user performing the update does not realize the data is stored redundantly the
update will not be done properly.
2. A deletion anomaly is the unintended loss of data due to deletion of other data. For example, if
the student group Beta Alpha Psi disbanded and was deleted from the table above, J.
Longfellow and the Accounting department would cease to exist. This results in database
inconsistencies and is an example of how combining information that does not really belong
together into one table can cause problems.
3. An insertion anomaly is the inability to add data to the database due to absence of other data.
For example, assume Student_Group is defined so that null values are not allowed. If a new
employee is hired but not immediately assigned to a Student_Group then this employee could
not be entered into the database. This results in database inconsistencies due to omission.
Update, deletion, and insertion anomalies are very undesirable in any database. Anomalies are avoided
by the process of normalization.
NOTE: The term first normal form (1NF) describes the tabular format in which:
All of the key attributes are defined
There are no repeating groups in the table. In other words, each row/column in
intersection contains one and only one value, not a set of values.
All attributes are dependent on the primary key.
CS 221 – DATABASE MANAGEMENT SYSTEM Page 27 of 41
OUR LADY OF FATIMA UNIVERSITY
COLLEGE OF COMPUTER STUDIES
WEEK 13
The methodology started by producing a local conceptual data model and then derived a set of
relations to produce a local logical data model. The logical database design phase concluded by
merging together the local data models.
In presenting a database design methodology we divide the design process into three main phases:
conceptual, logical, and physical database design. The phase prior to physical design, namely logical
database design, is largely independent of implementation details, such as the specific functionality of
the target DBMS and application programs, but is dependent on the target data model. The output of
this process is a global logical model consisting of an ER/relation diagram, relational schema, and
supporting documentation that describes this model, such as a data dictionary. Together, these
represent the sources of information for the physical design process, and they provide the physical
database designer with a vehicle for making tradeoffs that are so important to an efficient database
design.
Whereas logical database design is concerned with the what, physical database design is concerned
with how. It requires different skills that are often found in the different people. In particular, the
physical database designer must know how the computer system hosting the DBMS operates, and
must be fully aware of the functionality of the target DBMS. As the functionality provided by current
systems varies widely, physical design must be tailored to specific DBMS. However, physical database
design is not an isolating activity – there is often feedback between physical, logical, and application
design. For example, decisions taken during physical design for improving performance, such as
merging relations together, might affect the structure of the logical data model, which will have an
associated effect on the application design.
1. To develop a logical database, analyze the business of the organization that the database would
support, how the operations relate to each other, and what data is used in business operations.
After this analysis, model the data. This modeling involves studying data usage and grouping
data elements into logical units so that a task supported by one or more organizational units is
independent of support provided for other tasks.
2. By providing each task with its own data groups, changes in the data requirements of one task
will have minimal, if any, impact on data provided for another task. By having data managed as
CS 221 – DATABASE MANAGEMENT SYSTEM Page 32 of 41
OUR LADY OF FATIMA UNIVERSITY
COLLEGE OF COMPUTER STUDIES
a synthesis, data redundancy is minimized and data consistency among tasks and activities is
improved. The figure below graphically expresses this point.
3. Logical database design comprises two methods to derive a logical database design. The first
method is used to analyze the business performed by an organization. Following this analysis,
the second method is used to model the data that supports the business. These methods are:
A. Business Analysis
B. Data Modeling
WEEK 14
WEEK 15
SQL
Stands for Structured Query Language
A standard computer language for accessing and manipulating databases
Can execute queries against a database
Can retrieve data from a database
Can insert new records to a database
Can delete records from a database
Can update records in a database
2. Creating a database does not select it for use; you must do that explicitly. To make company the
current database, use this command:
mysql> CREATE TABLE emp (empno INT(2) PRIMARY KEY, lname VARCHAR(15), fname
VARCHAR(15), category VARCHAR(1), rate DECIMAL(5,2);
4. Once you have created a table, SHOW TABLES should produce some output:
5. To verify that your table was created the way you expected, use a DESCRIBE statement:
Count(*)
4
category Count(*)
A 1
B 2
C 1
WEEK 17
Database Issues
Data is a valuable resource that must be strictly controlled and managed, as with corporate resource.
Part or all of the corporate data may have strategic importance to an organization and should therefore
be kept secure and confidential.
Integrity
Integrity constraints also contribute to maintaining a secure database system by preventing data from
becoming invalid, and hence giving misleading or incorrect results.
Database Security
- mechanisms that protect the database against international or accidental threats.
Security considerations apply not only to the data held in a database: breaches of security may affect
other parts of the system, which may in turn affect the database.
Transaction Support
Transaction is an action, or series of actions, carried out by a single user or application program, which
reads or updates the contents of the database.
A transaction should always transform the database from one consistent state to another.
Concurrency Control
The process of managing simultaneous operations on the database without having them interferes with
one another.
A major objective in developing a database is to enable many users to access shared data
concurrently. Concurrent access is relatively easy if all users are only reading data, as there is no way
that they can interfere with one another. However, when two or more users are accessing the
database simultaneously and at least one is updating data, there may be interference that can result in
inconsistencies.
Recoverability
Serializiability identifies schedules that maintain the consistency of the database, assuming that none of
the transactions in the schedule fails. An alternative perspective examines the recoverability of
transactions within a schedule. If a transactions fails, the atomicity property requires undo the effects of
the transaction. In addition, the durability property states that once a transaction commits, its changes
cannot be undone.