CSC212Lesson One
CSC212Lesson One
) (3 Units)
Lecture Material by Dr.(Mrs) E. Omede
LESSON ONE
INTRODUCTION
In this course we will study the concept of data, database and other related terms use in Database
Management.
THE OBJECTIVES
The objective of this course include;
o To understand data organization
o To understand problems solved by database
o To understand objective of database(data-related goals)
o To understand database modelling
What is a Database?
To understand the concept of database, we have to first look at the fundamental principles of
database.
Data
Data is the building block of any database. It is distinct pieces of information that may not make
sense independently, examples; 1, Ejiro, Computer science, 18 which is usually formatted in a special
way. All software is divided into two general categories: data and programs. Programs are
collections of instructions for manipulating data.
Data can exist in a variety of forms -- as numbers (0 -9) or text on pieces of paper, as bits (0,1) and
bytes (10001100) stored in electronic memory, or as facts stored in a person's mind.
Strictly speaking, data is the plural of datum, a single piece of information. In practice, however,
people use data as both the singular and plural form of the word.
(2) The term data is often used to distinguish binary machine-readable information from textual
human-readable information. For example, some applications make a distinction between data files
(files that contain binary data) and text files (files that contain ASCII data).
(3) In database management systems, data files are the files that store the database information,
whereas other files, such as index files and data dictionaries, store administrative information, known
as metadata (data about data)
Data Organization
Bit: A bit is a smallest unit of data. This term is curled from Binary digit. Examples include; 0 or 1,
T or F, High or Low, On or Off. In database, this data type is known as Boolean.
Bytes:
Byte is a unit of data which can 8 bits, 16bits, 32bits or 64 bits base on the data representation
scheme. The term is curled from the word (By)eith(te). It represents a character such as “A”,
“B”,”2”, and so on.
Field
Field is a group of related characters or bytes.
A space allocated for a particular item of information. A student form for example, contains a
number of fields: Sn Name Department Age
1 Ejiro Computer Science 18 Record
Field
Sn, Name, Department, Age and so on. In database systems, fields which is also known as
Attributes or Domain are the smallest units of information you can access.
Most fields have certain properties associated with them. For example, some fields are numeric
whereas others are textual, some are long, while others are short. In addition, every field has a
name, called the field name.
In database management systems, a field can be required, optional, or calculated. A required field
is one in which you must enter data, while an optional field is one you may leave blank. A
calculated field is one whose value is derived from some formula involving other fields. You do not
enter data into a calculated field; the system automatically determines the correct value. A
collection of fields is called a Record.
Record
Record is a complete set of information gotten from collection of related terms.
It composed of fields, each of which contains one item of information. A set of records constitutes a
file. For example, a student file might contain records that have three fields: a Sn field, a Name
field, a Department field and an Age field as is shown on the above table.
In relational database management systems, records are called tuples.
File
A collection of data records or information that has a name, called the filename. Almost all
information stored in a computer must be in a file. There are many different types of files: data
files, text files , program files, directory files, and so on. Different types of files store different types
of information. For example, program files store programs, whereas text files store text. In database
Tables or Relations which comprise of Fields and Records created are also regarded as files.
Database
This is a collection of interrelated data or information organized in such a way that a computer
program can quickly select desired pieces of data or simply put collection of related tables. You can
think of a database as an electronic filing system. Traditional databases are organized by fields,
records, and files.
The collection of 3 tables; T1, T2 and T3 can be called a “related collection” because it is clear that
there are some common attributes existing in a selected pair of tables. With these common
attributes we can combine the data of two or more tables together to find out the complete details of
a student. The hostel of the eldest student can be gotten although Age and Hostel attributes are in
different tables.
Advantages of DBMS:
i Controlling of Redundancy: the centralization of database and its control reduces
data redundancy (i.e storing same data multiple times). It also eliminates the extra time
for processing the large volume of data thereby saving the storage space.
iii Improved Data Sharing: DBMS allows a user to share the data in any number
of application programs.
v Security: Having complete authority over the operational data, enables the DBA in
ensuring that the only means of access to the database is through proper channels. The
DBA can define authorization checks to be carried out whenever access to sensitive data is
attempted.
vi Efficient Data Access: In a database system, the data is managed by the DBMS and
all access to the data is through the DBMS providing a key to effective data processing
vii Enforcements of Standards : With the centralized of data, DBA can establish and
enforce the data standards which may include the naming conventions, data quality
standards etc.
viii Data Independence : In a database system, the database management system provides the
interface between the application programs and the data. When changes are made to the
data representation, the meta data obtained by the DBMS is changed but the DBMS
continues to provide the data to application program in the previously used way. The
DBMS handles the task of transformation of data wherever necessary.
ix Reduced Application Development and Maintenance Time : DBMS supports many
important functions that are common to many applications, accessing data stored in the
database, which facilitates the quick development
Disadvantages of DBMS
i The support of DBMS for multiple functionalities to give users the best, makes the
underlying software to be complex, thus the designers and developers should have
thorough knowledge of the software to be able to get the best of it.
ii Consumes large memory for efficient running due to its complexity and functionality.
iii DBMS system works on the centralized system, hence any failure of the DBMS, will
effect all the users who are accessing it from all part of the world.
iv DBMS is generalized software, i.e.; it is written work on the entire systems rather
specific one. Hence some of the application will run slow.
Typical Applications of Database
➢ Student Records
➢ Hotel Booking
➢ Library
➢ Maintenance Information System
➢ Banking System
➢ Sales Records
DBMS ARCHITECTURE:
Database Model
A database model is a logical constructs used to represent the data structure and data relationships
found within the database. Data model is not just a way of structuring data: it also defines a set of
operations that can be performed on the data. The relational model, for example, defines
operations such as selection, projection, and join. Although these operations may not be explicit in
a particular query language, they provide the foundation on which a query language is built.
Entities
Entities are the principal data object about which information is to be collected. Entities are usually
recognizable concepts, either concrete or abstract, such as person, places, things, or events which
have relevance to the database. Some specific examples of entities are EMPLOYEES, PROJECTS,
INVOICES. An entity is similar to a table in the relational model.
Entities are classified as independent or dependent (in some methodologies, the terms used are strong
and weak, respectively). An independent entity is one that does not rely on another for identification.
A dependent entity is one that relies on another for identification.
An entity occurrence (also called an instance) is an individual occurrence of an entity. An
occurrence is similar to a row in the relational table.
Relationships
A Relationship represents an association between two or more entities. An example of a
relationship would be:
o employees are assigned to projects
o projects have subtasks
o departments manage one or more projects
Relationships are classified in terms of degree, connectivity, cardinality, and existence. These
concepts will be discussed below.
Attributes
Attributes describe the entity of which they are associated. A particular instance of an attribute is a
value. For example, "Oghene Paul " is one value of the attribute Name. The domain of an attribute
is the collection of all possible values an attribute can have. The domain of Name is a character string.
Attributes can be classified as identifiers or descriptors. Identifiers, more commonly called keys,
uniquely identify an instance of an entity. A descriptor describes a non-unique characteristic of an
entity instance.
Classifying Relationships
Relationships are classified by their degree, connectivity, cardinality, direction, type, and existence.
Not all modelling methodologies use all these classifications.
Degree of a Relationship
The degree of a relationship is the number of entities associated with the relationship. The n-ary
relationship is the general form for degree n. Special cases are the binary, and ternary, where the
degree is 2, and 3, respectively.
Binary relationships, the association between two entities is the most common type in the real
world. A recursive binary relationship occurs when an entity is related to itself. An example might
be "some employees are married to other employees".
A ternary relationship involves three entities and is used when a binary relationship is inadequate.
Many modelling approaches recognize only binary relationships. Ternary or n-ary relationships are
decomposed into two or more binary relationships.
Direction
The direction of a relationship indicates the originating entity of a binary relationship. The entity
from which a relationship originates is the parent entity; the entity where the relationship
terminates is the child entity.
The direction of a relationship is determined by its connectivity. In a one-to-one relationship the
direction is from the independent entity to a dependent entity. If both entities are independent, the
direction is arbitrary. With one-to-many relationships, the entity occurring once is the parent. The
direction of many-to-many relationships is arbitrary.
Type
An identifying relationship is one in which one of the child entities is also a dependent entity. A
non-identifying relationship is one in which both entities are independent.
Existence
Existence denotes whether the existence of an entity instance is dependent upon the existence of
another, related, entity instance. The existence of an entity in a relationship is defined as either
mandatory or optional. If an instance of an entity must always occur for an entity to be included in
a relationship, then it is mandatory. An example of mandatory existence is the statement "every
project must be managed by a single department". If the instance of the entity is not required, it is
optional. An example of optional existence is the statement, "employees may be assigned to work
on projects".
Generalization Hierarchies
A generalization hierarchy is a form of abstraction that specifies that two or more entities that share
common attributes can be generalized into a higher level entity type called a supertype or generic
entity. The lower-level of entities become the subtype, or categories, to the supertype. Subtypes are
dependent entities.
Generalization occurs when two or more entities represent categories of the same real-world object.
For example, Wages_Employees and Classified_Employees represent categories of the same entity,
Employees. In this example, Employees would be the supertype; Wages_Employees and
Classified_Employees would be the subtypes.
Subtypes can be either mutually exclusive (disjoint) or overlapping (inclusive). A mutually exclusive
category is when an entity instance can be in only one category. The above example is a mutually
exclusive category. An employee can either be wages or classified but not both. An overlapping
category is when an entity instance may be in two or more subtypes. An example would be a person
who works for a university could also be a student at that same university. The completeness
constraint requires that all instances of the subtype be represented in the supertype.
Generalization hierarchies can be nested. That is, a subtype of one hierarchy can be a supertype of
another. The level of nesting is limited only by the constraint of simplicity. Subtype entities may be
the parent entity in a relationship but not the child.
ER Notation
There is no standard for representing data objects in ER diagrams. Each modelling methodology
uses its own notation. The original notation used by Chen is widely used in academics texts and
journals but rarely seen in either CASE tools or publications by non-academics. Today, there are a
number of notations used, among the more common are Bachman, crow's foot, and IDEFIX.
All notational styles represent entities as rectangular boxes and relationships as lines connecting
boxes. Each style uses a special set of symbols to represent the cardinality of a connection. The
notation used in this document is from Martin. The symbols used for the basic ER constructs are:
• entities are represented by labelled rectangles. The label is the name of the entity. Entity
names should be singular nouns.
• relationships are represented by a solid line connecting two entities. The name of the
relationship is written above the line. Relationship names should be verbs.
• attributes, when included, are listed inside the entity rectangle. Attributes which are
identifiers are underlined. Attribute names should be singular nouns.
• cardinality of many is represented by a line ending in a crow's foot. If the crow's foot is
omitted, the cardinality is one.
• existence is represented by placing a circle or a perpendicular bar on the line. Mandatory
existence is shown by the bar (looks like a 1) next to the entity for an instance is required.
Optional existence is shown by placing a circle next to the entity that is optional. Examples
of these symbols are shown in the Figure 2 below:
Person
-Name
-Age
-SetName()
Objects
Real world entities and situations are represented as objects, it can be said to be instance of class. This
consist of data and its relationships. Object encapsulates data and code in a single unit which provides
data abstraction by hiding the implementation details from the user. Real world problems are represented
as objects with different attributes. In Figure 4, the instances of student, Doctor and Engineer are objects.
Class
This is collection of similar objects with shared structure (attributes and methods). Grouping of related
attributes and methods together as class, an object is an instance of the class. In the figure 3, Student,
Doctor, Engineer are classes under the parent class Person
Lets illustrate class and object with one of the class using object oriented program
Class student
{
Char Name (30)
Char Matno(10)
----
----
Public:
void search()
void update()
}
In the above illustration, student is a class while S1, S2 --- Sn are the objects which can be created in
the main function.
Inheritance
A new class can be derived from the original class. The derived class contains the attributes and method
of the parent class together with its own. For instance, the classes Student, Doctor and Engineer are
inherited from the base class Person.
Implementation Category:
This is concerned with how data are represented in the database. In this category we have three
models namely:
- Hierarchical Model
- Network Model
- Relational Model
Hierarchical model
The hierarchical data model organizes data in a tree-like structure. There is a hierarchy of parent and
child data segments. This structure stores data as records which are connected to each other
through links. A record is collection of fields with each field containing only one value. The
fields in a record is determined by the type of record. A record can have repeating information,
generally in the child data segments. The record types are the equivalent of tables in the relational
model, and with the individual records being the equivalent of rows. To create links between these
record types, the hierarchical model uses Parent Child Relationships. These are a 1:N mapping
between record types. This is done by using trees, like set theory used in the relational model,
"borrowed" from Maths. For example, an organization might store information about an employee,
such as name, employee number, department, salary. The organization might also store
information about an employee's children, such as name and date of birth. The employee and
children data forms a hierarchy, where the employee data represents the parent segment and the
children data represents the child segment. If an employee has three children, then there would be
three child segments associated with one employee segment. In a hierarchical database the parent-
child relationship is one to many. This restricts a child segment to having only one parent segment.
Hierarchical DBMSs were popular from the late 1960s, with the introduction of IBM's Information
Management System (IMS) DBMS, through the 1970s.
Hierarchical Structure
Root Segment A
B C D
Level 1
Level 2 B1 B2 C1 C2 C3 D1
MODULE MODULE
A B
Advantages
i. It promotes data security
ii. It promotes data independence
iii. It promotes data integrity (parent/child relationship)
iv. It is useful for large databases
v. It is useful when users require a lot of transactions
vi. It is suitable for large storage media.
Disadvantages
i. Require knowledge of the physical level of data storage
ii. Cannot handle the case where a part may belong to two or more components.
iii. New relations and nodes result in complex system management task.
iv. Modification to data structure leads to significant modifications to application programs.
v. It has no specific or precise standard.
vi. Does not provide the favored ad-hoc query capability easily.
Network Model
The popularity of the network data model coincided with the popularity of the hierarchical data
model. Some data were more naturally modelled with more than one parent per child. So, the
network model permitted the modelling of many-to-many relationships in data. In 1971, the
Conference on Data Systems Languages (CODASYL) formally defined the network model. The
basic data modelling construct in the network model is the set construct. A set consists of an
owner record type, a set name, and a member record type. A member record type can have that
role in more than one set, hence the multi-parent concept is supported. An owner record type can
also be a member or owner in another set. The data model is a simple network, and link and
intersection record types (called junction records by IDMS) may exist, as well as sets between
them. Thus, the complete network of relationships is represented by several pairwise sets; in each
set some (one) record type is owner (at the tail of the network arrow) and one or more record types
are members (at the head of the relationship arrow). Usually, a set defines a 1:M relationship,
although 1:1 is permitted. The CODASYL network model is based on mathematical set theory.
Examples of Network Model
Consider the following: SalesRep, Invoice, Product, InvoiceLine, Customer
- A SalesRep may have written many invoices. Each invoice is written by a single SaleRep.
- A customer may have made purchases in many occasions. Each occasion corresponds to one
invoice.
- An invoice may have many invoice lines. Each invoice line is found in an invoice ticket.
- A product may appear on several invoice lines. Each invoice line contains only a single
product.
SalesRep Customer
Invoice Line
Advantages
- Improve on Hierarchical model
- An application can access an owner record and all the member records in the set
- The movement from one owner to the other is eased
- Promotes data integrity because of the required owner-membership relationship.
Disadvantages
- Difficult to design and use properly.
- The user and the programmer must be familiar with the data structure.
- Does not promote structural independence.
- Navigational data access problems.
Relational Model
(RDBMS - relational database management system) A database based on the relational model
developed by E.F. Codd. A relational database allows the definition of data structures, storage and
retrieval operations and integrity constraints. In such a database the data and relations between
them are organized in tables. A table is a collection of records and each record in a table contains
the same fields.
Properties of Relational Tables:
➢ Values Are Atomic
➢ Each Row is Unique
➢ Column Values Are of the Same Kind
➢ The Sequence of Columns is Insignificant
➢ The Sequence of Rows is Insignificant
➢ Each Column Has a Unique Name
Certain fields may be designated as keys, which means that searches for specific values of that field
will use indexing to speed them up. Where fields in two different tables take values from the same
set, a join operation can be performed to select related records in the two tables by matching values
in those fields. Often, but not always, the fields will have the same name in both tables. For example,
an "orders" table might contain (customer-ID, product-code) pairs and a "products" table might
contain (product-code, price) pairs so to calculate a given customer's bill you would sum the prices
of all products ordered by that customer by joining on the product-code fields of the two tables.
This can be extended to joining multiple tables on multiple fields. Because these relationships are
only specified at retrieval time, relational databases are classed as dynamic database management
system. The RELATIONAL database model is based on the Relational Algebra.
Products Table
S/N Product_code Product Name price
1 Be001 A tin of Ovaltine N5600
2 Me003 450g of pork N458
Advantages
- Structural independence i.e. can concentrate on the logical view.
- Relational database model structures and data independence enables us to view data
logically rather than physically.
- The logical view allows simpler file concept of data storage
- The use of logically independent tables is easier to understand.
- Logical simplicity yields simpler and more effective database design methodologies.
- SQL capability
Disadvantages
More hardware and operating system overhead i.e. RDBMS may be slower. Ease
of use can be a liability i.e. there may be possible mis-use.