Unit I REVIEW OF RELATIONAL DATA MODEL AND RELATIONAL DATABASE CONSTRAINTS
Unit I REVIEW OF RELATIONAL DATA MODEL AND RELATIONAL DATABASE CONSTRAINTS
What is DBMS?
Database Management Systems (DBMS) are software systems used to store, retrieve, and run
queries on data. A DBMS serves as an interface between an end-user and a database, allowing users
to create, read, update, and delete data in the database.
DBMS manage the data, the database engine, and the database schema, allowing for data to be
manipulated or extracted by users and other programs. This helps provide data security, data
integrity, concurrency, and uniform data administration procedures.
DBMS optimizes the organization of data by following a database schema design technique called
normalization, which splits a large table into smaller tables when any of its attributes have
redundancy in values. DBMS offer many benefits over traditional file systems, including flexibility and
a more complex backup system.
Database management systems can be classified based on a variety of criteria such as the data
model, the database distribution, or user numbers. The most widely used types of DBMS software
are relational, distributed, hierarchical, object-oriented, and network.
A distributed DBMS is a set of logically interrelated databases distributed over a network that is
managed by a centralized database application. This type of DBMS synchronizes data periodically and
ensures that any change to data is universally updated in the database.
Hierarchical databases organize model data in a tree-like structure. Data storage is either a top-down
or bottom-up format and is represented using a parent-child relationship.
The network database model addresses the need for more complex relationships by allowing each
child to have multiple parents. Entities are organized in a graph that can be accessed through several
paths.
Relational database management system
Relational database management systems (RDBMS) are the most popular data model because of its
user-friendly interface. It is based on normalizing data in the rows and columns of the tables. This is a
viable option when you need a data storage system that is scalable, flexible, and able to manage lots
of information.
Examples of DBMS
There is a wide range of database software solutions, including both enterprise and open-source
solutions, available for database management.
Oracle
MySQL
MySQL is a relational database management system that is commonly used with open-source
content management systems and large platforms like Facebook, Twitter, and YouTube.
SQL Server
Developed by Microsoft, SQL Server is a relational database management system built on top of
structured query language (SQL), a standardized programming language that allows database
administrators to manage databases and query data.
Relational model concepts:
The relational model for database management is an approach to logically represent and manage the
data stored in a database. In this model, the data is organized into a collection of two-dimensional
inter-related tables, also known as relations. Each relation is a collection of columns and rows, where
the column represents the attributes of an entity and the rows (or tuples) represents the records.
The use of tables to store the data provided a straightforward, efficient, and flexible way to store and
access structured information. Because of this simplicity, this data model provides easy data sorting
and data access. Hence, it is used widely around the world for data storage and processing.
Highlights:
As discussed earlier, a relational database is based on the relational model. This database consists of
various components based on the relational model. These include:
Attribute/Field : Column of the relation, depicting properties that define the relation.
Attribute Domain : Set of pre-defined atomic values that an attribute can take i.e., it describes the
legal values that an attribute can take.
Relation Key: It is an attribute or a group of attributes that can be used to uniquely identify an entity
in a table or to determine the relationship between two tables. Relation keys can be of 6 different
types:
1. Candidate Key: The minimal set of attributes that can uniquely identify a tuple is
known as a candidate key. For Example, STUD_NO in STUDENT relation.
2. Super Key: The set of attributes that can uniquely identify a tuple is known as Super
Key. For Example, STUD_NO, (STUD_NO, STUD_NAME), etc. A super key is a group of
single or multiple keys that identifies rows in a table. It supports NULL values.
3. Primary Key: There can be more than one candidate key in relation out of which one
can be chosen as the primary key. For Example, STUD_NO, as well as STUD_PHONE,
are candidate keys for relation STUDENT but STUD_NO can be chosen as the primary
key (only one out of many candidate keys).
4. Alternate Key: The candidate key other than the primary key is called an alternate
key.
5. Foreign Key: An attribute can only take the values which are present as values of
some other attribute, it will be a foreign key to the attribute to which it refers. The
relation which is being referenced is called referenced relation and the
corresponding attribute is called referenced attribute. The relation which refers to
the referenced relation is called referencing relation and the corresponding attribute
is called referencing attribute
Domain Constraint: It specifies that every attribute is bound to have a value that lies inside a
specific range of values. It is implemented with the help of the Attribute Domain concept.
Key Constraint: It states that every relation must contain an attribute or a set of attributes
(Primary Key) that can uniquely identify a tuple in that relation. This key can never be NULL
or contain the same value for two different tuples.
Each data record that relates to a tuple of a relation in a table needs to be distinct. Thus, this
implies that no two rows or tuples in relation or table can have the same combination of
values for their entire data item. Every relation has a super key by default and depicts
uniqueness constraints. It is a combination of all the characteristics. Sometimes a relation
can have more than just one key. Each such key is a candidate key. Out of these, we need to
define one of the keys as the Primary Key.
Referential Integrity Constraint: It is defined between two inter-related tables. It states that
if a given relation refers to a key attribute of a different or same table, then that key must
exist in the given relation.
It is specified to maintain the consistency among the tuples of two or more relations.
Entity integrity constraint: This constraint states that primary key value cannot be null as it
defines the individual tuple in a relation. A null value indicates the failure to identify such
tuples and thus it means that they are duplicates.
Relational schemas don't contain actual data because it's simply a blueprint. A developer's goal is to
design the schema in a way that ensures the information is readable and avoids redundancy. The
developer can choose to display a schema as a visual depiction, like a graph, or as formulas written in
coding language.
There are two different types of schemas: logical and physical. A logical schema represents how a
programmer organizes data within the table and a physical schema represents how a programmer
physically stores data on disk storage, which can show how they physically format the database.
Relational database schemas comprise many items to help make a clear and concise database. Each
component of a relational database schema helps define the connections within a database. Here
are some aspects to include in a relational database schema:
Tables
The primary component of relational schemas is the tables, which are typically sets of records. Tables
are usually subject-based and each contains a name and data type. The purpose of tables in a
relational database schema is to organize groups of data that developers could implement in their
databases. The number of tables depends on the size of the project.
Attributes
Attributes are the items within each table. Each table has attributes that define or describe the
table's subject. For example, a relational schema for a bakery might have tables that state
ingredients, recipes, types of baked goods, prices or customer information as attributes. In a
relational database schema, attributes are defining characteristics that determine the items in a
table. These can help further define and connect relationships between the tables.
Relations
Once the relational schema has the right number of tables, the developer needs to include relations
or connections. Developers often represent relations by using lines or arrows. The purpose of
relations is to show on the schema how each table connects to one another. The lines or arrows can
show a variety of meanings, such as two attributes being in the same field.
Primary keys
Primary keys are a column or group of columns that identify fields within tables. Each table has a
primary key, which is a unique identifier for each row. Primary keys are different for every field
because a table can't duplicate rows. The value within the primary key has to exist, so the value can't
be null. It's also important to note that the developer can only implement one primary key per table.
Foreign keys
Foreign keys are also common in relational database schemas. A foreign key is a column or group of
columns that identifies links between tables. Foreign keys are often primary keys from different
tables so it's easy to find the connection from one table to the next. The foreign key is, therefore,
referencing the primary key of a previous table. Unlike primary keys, foreign keys may have duplicate
rows and the values can be null while still working. Additionally, there can be more than one foreign
key in a table.
Relational database schemas are great tools for storing, organizing and defining tables for database
information. Companies or websites that use relational databases could benefit from starting with a
schema. Benefits of a relational database schema include:
Organization
Databases typically contain a lot of information, so using a database schema can help significantly
with the organization of each set of data. The relations are also useful for organizing information by
how different fields relate to one another. Organizing data by how different tables or attributes relate
is a great first step to creating a relational database.
Accuracy
Database schemas allow for the accuracy and integrity of information in a database. Accurate fields
of information and relations can help make a useful and informative database, which is good for
companies or sites that have a lot of information to organize and analyse. Relations within schemas
also help create accurate row and column organization when it comes time to create the relational
database
Accessibility
Clear relational database schemas are usually easy to read and comprehend, which makes the tables
and sets of information accessible. The primary purpose of database schemas is to make information
accessible, and through precise tables, attributes and connections, programmers have a simple time
navigating the information and using the relational schema as a guide for the database.
Update Operation:
The three basic types updates are as given below:
(i) Insert: This operation is performed to add a new tuple in a relation e.g., an attempt to add another
record of an account with data values corresponding to Code and its Type to Accounts relation shall
be made by performing Insert operation. The insert operation is capable of violating any of the four
constraints discussed above.
(ii) Delete: This operation is carried out to remove a tuple from a relation. A particular data record
from a table can be removed by performing such an operation. The delete operation can violate only
referential integrity, if tuple being removed is referenced by foreign key from other tuples in the
database.
(iii) Modify: The operation aims at causing a change in the values of some attributes in existing
tuples. This is useful in modifying existing values of an accounting record in a data table. Usually, this
operation does not cause problems provided the modification is directed on neither primary key nor
foreign key, whenever applied, these operations must enforce integrity constraints specified on
relational database schema. Retrieval operation on Relational Data Model does not cause violation of
any integrity constraints.
Anomalies
An anomaly is a deviation from the norm, a glitch or an error that doesn’t fit in with the rest of the
pattern of the database. Normalization takes care of these anomalies. Normalization ensures that all
three challenges (update, insert, and delete anomalies), as well as any others that may arise, are
addressed during the design process.
Normalization is required to organise data in a database. If it is not done, the overall data integrity in
the database will deteriorate over time. This is related to data abnormalities in particular. These
DBMS anomalies are common, and they result in data that doesn’t match with what the real-world
database claims to reflect.
When there is too much redundancy in the information present in the database, anomalies occur.
Also, when all the tables that make up a database are poorly constructed, anomalies are bound to
occur.
What exactly does “bad construction” imply? When the DB (database) designer constructs the
database, he should identify the entities that rely on one other for existence, such as hotel rooms
and the hotel, and then reduce the probability that one might ever exist independently of the other.
A database anomaly is a fault in a database that usually emerges as a result of shoddy planning and
storing everything in a flat database. In most cases, this is removed through the normalization
procedure, which involves the joining and splitting of tables. The purpose of the normalization
process is to minimise the negative impacts of generating tables that would generate anomalies in
the DB.
Example
Consider a manufacturing firm that keeps worker information in a table called employee, which has
four columns: w_id for the employee’s id, w_name for the employee’s name, w_address for the
employee’s address, and w_dept for the employee’s department. The table will look like this at some
point:
Various types of anomalies can occur in a DB. For instance, redundancy anomalies are a very
significant issue for tests. But these can be easily identified and fixed. The following are actually the
ones about which we should be worried:
1. Update
2. Insert
3. Delete
Anomalies in databases can be, thus, divided into three major categories:
Update Anomaly
Employee David has two rows in the table given above since he works in two different departments.
If we want to change David’s address, we must do so in two rows, else the data would become
inconsistent.
If the proper address is updated in one of the departments but not in another, David will have two
different addresses in the database, which is incorrect and leads to inconsistent data.
Insert Anomaly
If a new worker joins the firm and is currently unassigned to any department, we will be unable to
put the data into the table because the w_dept field does not allow nulls.
Delete Anomaly
If the corporation closes the department F890 at some point in the future, deleting the rows with
w_dept as F890 will also erase the information of employee Mike, who is solely assigned to this
department.
Insert –
Delete –
Update (Modify) –
Whenever we apply the above modification to the relation in the database, the constraints on the
relational database should not get violated.
Insert operation:
On inserting the tuples in the relation, it may cause violation of the constraints in the following way:
1. Domain constraint:
Domain constraint gets violated only when a given value to the attribute does not appear in the
corresponding domain or in case it is not of the appropriate datatype.
Example:
Assume that the domain constraint says that all the values you insert in the relation should be
greater than 10, and in case you insert a value less than 10 will cause you violation of the domain
constraint, so gets rejected.
On inserting NULL values to any part of the primary key of a new tuple in the relation can cause
violation of the Entity integrity constraint.
Example:
The above insertion violates the entity integrity constraint since there is NULL for the
On inserting a value in the new tuple of a relation which is already existing in another tuple of the
same relation, can cause violation of Key Constraints.
Example:
This insertion violates the key constraint if EID=1200 is already present in some tuple in the same
relation, so it gets rejected.
Referential integrity:
On inserting a value in the foreign key of relation 1, for which there is no corresponding value in the
Primary key which is referred to in relation 2, in such case Referential integrity is violated.
Example:
When we try to insert a value say 1200 in EID (foreign key) of table 1, for which there is no
corresponding EID (primary key) of table 2, then it causes violation, so gets rejected.
Solution that is possible to correct such violation is if any insertion violates any of the constraints,
then the default action is to reject such operation.
Deletion operation:
On deleting the tuples in the relation, it may cause only violation of Referential integrity constraints.
It causes violation only if the tuple in relation 1 is deleted which is referenced by foreign key from
other tuples of table 2 in the database, if such deletion takes place then the values in the tuple of the
foreign key in table 2 will become empty, which will eventually violate Referential Integrity
constraint.
Solutions that are possible to correct the violation to the referential integrity due to deletion are
listed below:
Restrict –
Here we reject the deletion.
Cascade –
Here if a record in the parent table (referencing relation) is deleted, then the corresponding records
in the child table (referenced relation) will automatically be deleted.
Here we modify the referencing attribute values that cause violation and we either set NULL or
change to another valid value.
Objects are the basic building block and an instance of a class, where the type is either built-
in or user-defined.
Pointers help access elements of an object database and establish relations between
objects.
The main characteristic of objects in OODBMS is the possibility of user-constructed types. An object
created in a project or application saves into a database as is.
Object-oriented databases directly deal with data as complete objects. All the information comes in
one instantly available object package instead of multiple tables.
In contrast, the basic building blocks of relational databases, such as PostgreSQL or MySQL, are tables
with actions based on logical connections between the table data.
These characteristics make object databases suitable for projects with complex data which require an
object-oriented approach to programming. An object-oriented management system provides
supported functionality catered to object-oriented programming where complex objects are central.
This approach unifies attributes and behaviors of data into one entity.
Polymorphism
Inheritance
Encapsulation
Abstraction
These four attributes describe the critical characteristics of object-oriented management systems.
Polymorphism
Polymorphism is the capability of an object to take multiple forms. This ability allows the same
program code to work with different data types. Both a car and a bike are able to break, but the
mechanism is different. In this example, the action break is a polymorphism. The defined action
is polymorphic — the result changes depending on which vehicle performs.
Inheritance
Inheritance creates a hierarchical relationship between related classes while making parts of code
reusable. Defining new types inherits all the existing class fields and methods plus further extends
them. The existing class is the parent class, while the child class extends the parent.
For example, a parent class called Vehicle will have child classes Car and Bike. Both child
classes inherit information from the parent class and extend the parent class with new information
depending on the vehicle type.
Encapsulation
Encapsulation is the ability to group data and mechanisms into a single object to provide access
protection. Through this process, pieces of information and details of how an object works
are hidden, resulting in data and function security. Classes interact with each other through methods
without the need to know how particular methods work.
As an example, a car has descriptive characteristics and actions. You can change the color of a car, yet
the model or make are examples of properties that cannot change. A class encapsulates all the car
information into one entity, where some elements are modifiable while some are not.
Abstraction
Abstraction is the procedure of representing only the essential data features for the needed f
unctionality. The process selects vital information while unnecessary information stays hidden.
Abstraction helps reduce the complexity of modelled data and allows reusability.
For example, there are different ways for a computer to connect to the network. A web browser
needs an internet connection. However, the connection type is irrelevant. An established connection
to the internet represents an abstraction, whereas the various types of connections represent
different implementations of the abstraction.
Object-Oriented Database Advantages and Disadvantages:
Advantages
Complex data and a wider variety of data types compared to MySQL data types.
Disadvantages
Feature Description
Query
Language to find objects and retrieve data from the database.
Language
Transparent
Ability to use an object-oriented programming language for data manipulation.
Persistence
ACID
ACID transactions guarantee all transactions are complete without conflicting changes.
Transactions
Database Creates a partial replica of the database. Allows access to a database from program
Caching memory instead of a disk.
GemStone/S
Gemstone/S is best for high-availability projects. There are multiple options for licensing depending
on the project size. The database server is available for various platforms, including Linux, Windows,
macOS, Solaris, AIX, as well as Raspberry Pi.
ObjectDB
ObjectDB is a NoSQL object database for the Java programming language. Compared to other NoSQL
databases, ObjectDB is ACID compliant. ObjectDB does not provide an API and requires using one of
the two built-in Java database APIs:
ObjectDB includes all basic data types in Java, user-defined classes, and standard Java collections.
Every object has a unique ID. The number of elements is limited only by the maximum database size
(128 TB). ObjectDB is available cross-platform and the benchmark performance is exceptional.
ObjectDatabase++
ObjectDatabase++ supports:
Real-time recovery.
The object database is C++ based. One of the main features is advanced auto-recovery from system
crashes without compromising the database integrity.
Objectivity/DB
Objectivity/DB utilizes the power of objects and satisfies the complex requirements within Big Data.
The object database is flexible by supporting multiple languages:
C++
C#
Python
Java
The schema changes happen dynamically without the need for downtime, allowing real-time queries
against any data type. Objectivity/DB is available for multiple platforms, including macOS, Linux,
Windows, or Unix.
ObjectStore
ObjectStore integrates with C++ or Java and provides memory persistency to improve the
performance of application logic. The object database is ACID-compliant. The responsiveness allows
developers to build distributed applications cross-platform, whether on-premises or in the cloud.
The main feature is cloud scalability, which allows database access from anywhere. ObjectStore
simplifies the data creation and exchange process seamlessly.
Versant
Versant provides primary transparent object persistence from C++, Java, and .NET. However, there is
also support for Smalltalk and Python. Versant supports different APIs depending on the language
used. Standard SQL queries are also available, making Versant a NoSQL database.
The object database is a multi-user client-server database. Versant performs best when used for
online transaction systems with large amounts of data and concurrent users.
Abstract Data Types:
Data types such as int, float, double, long, etc. are considered to be in-built data types and we can
perform basic operations with them such as addition, subtraction, division, multiplication, etc. Now
there might be a situation when we need operations for our user-defined data type which have to be
defined. These operations can be defined only as and when we require them. So, in order to simplify
the process of solving problems, we can create data structures along with their operations, and such
data structures that are not in-built are known as Abstract Data Type (ADT).
Abstract Data type (ADT) is a type (or class) for objects whose behavior is defined by a set of values
and a set of operations. The definition of ADT only mentions what operations are to be performed
but not how these operations will be implemented. It does not specify how data will be organized in
memory and what algorithms will be used for implementing the operations. It is called “abstract”
because it gives an implementation-independent view.
The process of providing only the essentials and hiding the details is known as abstraction.
The user of data type does not need to know how that data type is implemented, for example, we
have been using Primitive values like int, float, char data types only with the knowledge that these
data type can operate and be performed on without any idea of how they are implemented.
So a user only needs to know what a data type can do, but not how it will be implemented. Think of
ADT as a black box which hides the inner structure and design of the data type. Now we’ll define
three ADTs namely List ADT, Stack ADT, Queue ADT.
1. List ADT
The data is generally stored in key sequence in a list which has a head structure consisting
of count, pointers and address of compare function needed to compare the data in the list.
The data node contains the pointer to a data structure and a self-referential pointer which
points to the next node in the list.
remove() – Remove the first occurrence of any element from a non-empty list.
2. Stack ADT
View of stack
In Stack ADT Implementation instead of data being stored in each node, the pointer to data is
stored.
The program allocates memory for the data and address is passed to the stack ADT.
The head node and the data nodes are encapsulated in the ADT. The calling function can only
see the pointer to the stack.
The stack head structure also contains a pointer to top and count of number of entries
currently in stack.
pop() – Remove and return the element at the top of the stack, if it is not empty.
peek() – Return the element at the top of the stack without removing it, if the stack is not
empty.
3. Queue ADT
View of Queue
The queue abstract data type (ADT) follows the basic design of the stack abstract data type.
Each node contains a void pointer to the data and the link pointer to the next element in the queue.
The program’s responsibility is to allocate memory for storing the data.
dequeue() – Remove and return the first element of the queue, if the queue is not empty.
peek() – Return the element of the queue without removing it, if the queue is not empty.
Abstract data types (ADTs) are a way of encapsulating data and operations on that data into a single
unit. Some of the key features of ADTs include:
Abstraction: The user does not need to know the implementation of the data structure only
essentials are provided.
Robust: The program is robust and has the ability to catch errors.
Encapsulation: ADTs hide the internal details of the data and provide a public interface for
users to interact with the data. This allows for easier maintenance and modification of the
data structure.
Data Abstraction: ADTs provide a level of abstraction from the implementation details of the
data. Users only need to know the operations that can be performed on the data, not how
those operations are implemented.
Data Structure Independence: ADTs can be implemented using different data structures,
such as arrays or linked lists, without affecting the functionality of the ADT.
Information Hiding: ADTs can protect the integrity of the data by allowing access only to
authorized users and operations. This helps prevent errors and misuse of the data.
Modularity: ADTs can be combined with other ADTs to form larger, more complex data
structures. This allows for greater flexibility and modularity in programming.