chp1
chp1
Chapter 1
Syllabus Outline CH 1
• Relational Model Integration: overview of database management system,
limitations of data processing environment, database approach, instance
and schema, three level of abstraction, DBMS structure, ACID Properties
• Entity Relation Model: Entity, attributes, keys, relations, cardinality,
participation, weak entities, ER diagram, Generalization, Specialization and
aggregation, conceptual design with ER model, entity versus attribute,
entity versus relationship, binary versus ternary relationship, aggregate
versus ternary relationship.
• Functional dependency, Relational Database design: features of good
relational database design, atomic domain and Normalization (1NF, 2NF,
3NF, BCNF).
Database
• A database is a collection of data, typically describing the activities of
one or more related organizations.
• For example, a university database might contain information about
the following:
• Entities such as students, faculty, courses, and classrooms.
• Relationships between entities, such as students' enrollment in courses,
• faculty teaching courses, and the use of rooms for courses.
DBMS
• A database management system, or DBMS, is software designed to
assist in maintaining and utilizing large collections of data. The need
for such systems, as well as their use, is growing rapidly.
File systems v/s database
• We probably do not have 500 GB of main memory to hold all the data. We must therefore store data in a
storage device such as a disk or tape and
bring relevant parts into main memory for processing as needed.
• Even if we have 500 GB of main memory, on computer systems with 32-bit addressing, we cannot refer
directly to more than about 4 GB of data. We
have to program some method of identifying all data items.
• We have to write special programs to answer each question a user may want to ask about the data. These
programs are likely to be complex because
of the large volume of data to be searched.
• We must protect the data from inconsistent changes made by different users accessing the data
concurrently. If applications must address the details of such concurrent access, this adds greatly to their
complexity.
• We must ensure that data is restored to a consistent state if the system craches while changes are being
made.
• Operating systems provide only a password mechanism for security. This is not sufficiently flexible to enforce
security policies in which different users have permission to access different subsets of the data.
Advantages of DBMS
• Data Independence: Application programs should not, ideally, be exposed
to details of data representation and storage, provides an abstract view of
the data that hides such details.
• Efficient Data Access: A DBMS utilizes a variety of sophisticated techniques
to store and retrieve data efficiently. This feature is especially important if
the data is stored on external storage devices.
• Data Integrity and Security: If data is always accessed through the DBMS,
the DBMS can enforce integrity constraints.
• For example, before inserting salary information for an employee, the DBMS can
check that the department budget is not exceeded.
• Also, it can enforce access controls that govern what data is visible to different
classes of users.
Advantages of DBMS
• Data Administration: When several users share the data, centralizing the
administration of data can offer significant improvements. Organizing the data
representation to minimize redundancy and for fine-tuning the storage of the
data to make retrieval efficient.
• Concurrent Access and Crash Recovery: A DBMS schedules concurrent accesses
to the data in such a manner that users can think of the data as being accessed by
only one user at a time. Further, the DBMS protects users from the effects of
system failures.
• Reduced Application Development Time: Clearly, the DBMS supports important
functions that are common to many applications accessing data in the DBMS.
This, in conjunction with the high-level interface to the data, facilitates quick
application development.
• DBMS applications are also likely to be more robust than similar stand-alone
applications because many important tasks are handled by the DBMS
Data Model
• A data model is a collection of high-level data description constructs
that hide many low-level storage details.
• A DBMS allows a user to define the data to be stored in terms of a
data model.
• RDBMS is the database management system where data is stored in
the form of tables.
• RDBMS is the basis for all modern database systems such as MySQL,
Microsoft SQL Server, Oracle, and Microsoft Access.
Database Table
• A table is a collection of related data entries, and it consists of
columns and rows.
• A column holds specific information about every record in the table.
• A record (or row) is each individual entry that exists in a table.
Semantic Data Model
• Semantic data model is a more abstract, high-level data model that makes
it easier for a user to come up with a good initial description of the data in
an enterprise.
• These models contain a wide variety of constructs that help describe a real
application scenario.
• A database design in terms of a semantic model serves as a useful starting
point and is subsequently translated into a database design in terms of the
data model the DBMS actually supports.
• ER Model is used to model the logical view of the system from a data
perspective which consists of these symbols:
1. Total Participation – Each entity in the entity set must participate in the
relationship. If each student must enroll in a course, the participation of
students will be total. Total participation is shown by a double line in the ER
diagram.
2. Partial Participation – The entity in the entity set may or may NOT
participate in the relationship. If some courses are not enrolled by any of the
students, the participation in the course will be partial.
• The diagram depicts the ‘Enrolled in’ relationship set with Student
Entity set having total participation and Course Entity set having
partial participation.
How to Draw ER Diagram?
• The very first step is Identifying all the Entities, and place them in a
Rectangle, and labeling them accordingly.
• The next step is to identify the relationship between them and pace
them accordingly using the Diamond, and make sure that,
Relationships are not connected to each other.
• Attach attributes to the entities properly.
• Remove redundant entities and relationships.
• Add proper colors to highlight the data present in the database.
Enhanced ER Model
• Generalization, Specialization, and Aggregation in ER model are used
for data abstraction in which an abstraction mechanism is used to
hide details of a set of objects. Some of the terms were added to the
Enhanced ER Model, where some new concepts were added. These
new concepts are:
• Generalization
• Specialization
• Aggregation
Generalization
• Generalization is the process of extracting
common properties from a set of entities
and creating a generalized entity from it.
• It is a bottom-up approach in which two or
more entities can be generalized to a
higher-level entity if they have some
attributes in common.
• For Example, STUDENT and FACULTY can be
generalized to a higher-level entity called
PERSON In this case, common attributes like P_NAME, and P_ADD
become part of a higher entity (PERSON), and specialized
attributes like S_FEE become part of a specialized entity
(STUDENT).
Specialization
• In specialization, an entity is
divided into sub-entities
based on its characteristics. It
is a top-down approach
where the higher-level entity
is specialized into two or
more lower-level entities. For
Example, an EMPLOYEE entity
in an Employee management
system can be specialized into
DEVELOPER, TESTER, etc
Aggregation
• An ER diagram is not capable of representing the relationship between an
entity and a relationship which may be required in some scenarios. In those
cases, a relationship with its corresponding entities is aggregated into a
higher-level entity.
• Aggregation is an abstraction through which we can represent relationships
as higher-level entity sets.
• For Example, an Employee working on a project may require some
machinery. So, REQUIRE relationship is needed between the relationship
WORKS_FOR and entity MACHINERY.
• Using aggregation, WORKS_FOR relationship with its entities EMPLOYEE and
PROJECT is aggregated into a single entity and relationship REQUIRE is
created between the aggregated entity and MACHINERY.
Aggregation
Entity V/S Attribute
• Entity and Attributes are two essential terms of a database
management system (DBMS). The main difference between the Entity
and an attribute is that an entity is a real-world object, and attributes
describe the properties of an Entity.
Entity V/S Relationship
• An entity can have one or more attributes, which are properties or
characteristics that describe the entity. For example, a customer can
have attributes such as name, email, and address. A relationship is a
connection or association between two or more entities that
expresses how they interact or depend on each other.
ER Model V/S Relational Model
• The E-R Model and Relational Model are two aspects of the Data
Model in DBMS that are used to construct databases at the physical,
logical, and view levels. This article explains the complete overview of
the E-R Model and Relational Model. The difference between these
models is the most common part of an interview question. The key
distinction is that the E-R Model is entity-specific, while the
Relational Model is table-specific.
Differences between ER Model and Relational
Model
• the ER model describes the relationship between entities and their
attributes. On the other hand, the Relational Model referred to the
implementation of our model.
• The Relational Model is the implementation or representational model,
while the ER Model is the high-level or conceptual model.
• The data in components such as entity sets, relationship sets, and
attributes are represented by an ER model. The Relational model, on the
other hand, defines data in components such as tuples, attributes, and
attribute domains.
• As compared to a Relational Model, an ER model makes it easier to
understand the relationships between entities.
• Mapping Cardinality is always a constraint in the ER model, while the
cardinality constraint cannot be defined in the Relational Model.
Questions
• Binary V/s Ternary relationship
• Aggregate v/s Ternary relationship
Section 3
Functional dependency, Relational Database design: features of good relational
database design, atomic domain and Normalization (1NF, 2NF, 3NF, BCNF).
Functional Dependency
• In a relational database management, functional dependency is a
concept that specifies the relationship between two sets of attributes
where one attribute determines the value of another attribute. It is
denoted as X → Y, where the attribute set on the left side of the
arrow, X is called Determinant, and Y is called the Dependent.
• Functional dependencies are used to mathematically express
relations among database entities and are very important to
understand advanced concepts in Relational Database System
Example of Functional Dependency
From the above table we can conclude some valid
functional dependencies:
•roll_no → { name, dept_name, dept_building },
•roll_no → dept_name ,
•dept_name → dept_building , Dept_name can identify
the dept_building accurately, since departments with
different dept_name will also have a different
dept_building
•More valid functional dependencies:
•roll_no → name,
•{roll_no, name} ⇢ {dept_name, dept_building}, etc.
•name → dept_name Students with the same name can have different
dept_name, hence this is not a valid functional dependency.
•dept_building → dept_name There can be multiple departments in the same
building. Example, in the above table departments ME and EC are in the same
building B2, hence dept_building → dept_name is an invalid functional
dependency.
Armstrong’s axioms/properties of functional
dependencies
• Reflexivity: If Y is a subset of X, then X→Y holds by reflexivity rule
Example, {roll_no, name} → name is valid.
• Augmentation: If X → Y is a valid dependency, then XZ → YZ is also
valid by the augmentation rule.
Example, {roll_no, name} → dept_building is valid, hence {roll_no,
name, dept_name} → {dept_building, dept_name} is also valid.
• Transitivity: If X → Y and Y → Z are both valid dependencies, then
X→Z is also valid by the Transitivity rule.
Example, roll_no → dept_name & dept_name → dept_building, then
roll_no → dept_building is also valid.
Advantages of Functional Dependencies
• Functional dependencies having numerous applications in the field of database
management system. Here are some applications listed below:
• 1. Data Normalization
• Data normalization is the process of organizing data in a database in order to
minimize redundancy and increase data integrity. Functional dependencies play
an important part in data normalization. With the help of functional
dependencies we are able to identify the primary key, candidate key in a table
which in turns helps in normalization.
• 2. Query Optimization
• With the help of functional dependencies we are able to decide the connectivity
between the tables and the necessary attributes need to be projected to retrieve
the required data from the tables. This helps in query optimization and improves
performance.
Advantages of Functional DEpendency
• 3. Consistency of Data
• Functional dependencies ensures the consistency of the data by removing
any redundancies or inconsistencies that may exist in the data. Functional
dependency ensures that the changes made in one attribute does not
affect inconsistency in another set of attributes thus it maintains the
consistency of the data in database.
• 4. Data Quality Improvement
• Functional dependencies ensure that the data in the database to be
accurate, complete and updated. This helps to improve the overall quality
of the data, as well as it eliminates errors and inaccuracies that might occur
during data analysis and decision making, thus functional dependency
helps in improving the quality of data in database.
RDD
• Relational database design (RDD) models information and data into a set of
tables with rows and columns. Each row of a relation/table represents a
record, and each column represents an attribute of data. The Structured
Query Language (SQL) is used to manipulate relational databases. The
design of a relational database is composed of four stages, where the data
are modeled into a set of related tables. The stages are:
• Define relations/attributes
• Define primary keys
• Define relationships
• Normalization
RDD
• In an RDD, the data are organized into tables and all types of data
access are carried out via controlled transactions.
• Relational database design satisfies the ACID (atomicity, consistency,
integrity and durability) properties required from a database design.
• Relational database design mandates the use of a database server in
applications for dealing with data management problems.
The four stages of an RDD
• Relations and attributes: The various tables and attributes related to each table are identified. The tables
represent entities, and the attributes represent the properties of the respective entities.
• Primary keys: The attribute or set of attributes that help in uniquely identifying a record is identified and
assigned as the primary key
• Relationships: The relationships between the various tables are established with the help of foreign keys.
Foreign keys are attributes occurring in a table that are primary keys of another table. The types of
relationships that can exist between the relations (tables) are:
• One to one
• One to many
• Many to many
• An entity-relationship diagram can be used to depict the entities, their attributes and the relationship
between the entities in a diagrammatic way.
• Normalization: This is the process of optimizing the database structure. Normalization simplifies the
database design to avoid redundancy and confusion. The different normal forms are as follows:
• First normal form
• Second normal form
• Third normal form
• Boyce-Codd normal form
• Fifth normal form
Normalization
• Database normalization is the process of organizing the attributes of
the database to reduce or eliminate data redundancy (having the
same data but at different places).
• Problems because of data redundancy: Data redundancy
unnecessarily increases the size of the database as the same data is
repeated in many places. Inconsistency problems also arise during
insert, delete and update operations.
Features of Normalization
• Elimination of Data Redundancy: Data redundancy refers to the repetition of data in
different parts of the database. Normalization helps in reducing or eliminating this
redundancy, which can improve the efficiency and consistency of the database.
• Ensuring Data Consistency: Normalization helps in ensuring that the data in the database
is consistent and accurate. By eliminating redundancy, normalization helps in preventing
inconsistencies and contradictions that can arise due to different versions of the same
data.
• Simplification of Data Management: Normalization simplifies the process of managing
data in a database. By breaking down a complex data structure into simpler tables,
normalization makes it easier to manage the data, update it, and retrieve it.
• Improved Database Design: Normalization helps in improving the overall design of the
database. By organizing the data in a structured and systematic way, normalization
makes it easier to design and maintain the database. It also makes the database more
flexible and adaptable to changing business needs.
Features of Normalization
• Avoiding Update Anomalies: Normalization helps in avoiding update
anomalies, which can occur when updating a single record in a table
affects multiple records in other tables. Normalization ensures that
each table contains only one type of data and that the relationships
between the tables are clearly defined, which helps in avoiding such
anomalies.
• Standardization: Normalization helps in standardizing the data in the
database. By organizing the data into tables and defining relationships
between them, normalization helps in ensuring that the data is stored
in a consistent and uniform manner.
Normal Forms
• First Normal Form (1NF): This is the most basic level of normalization. In 1NF,
each table cell should contain only a single value, and each column should have a
unique name. The first normal form helps to eliminate duplicate data and simplify
queries.
• Second Normal Form (2NF): 2NF eliminates redundant data by requiring that
each non-key attribute be dependent on the primary key. This means that each
column should be directly related to the primary key, and not to other columns.
• Third Normal Form (3NF): 3NF builds on 2NF by requiring that all non-key
attributes are independent of each other. This means that each column should be
directly related to the primary key, and not to any other columns in the same
table.
• Boyce-Codd Normal Form (BCNF): BCNF is a stricter form of 3NF that ensures that
each determinant in a table is a candidate key. In other words, BCNF ensures that
each non-key attribute is dependent only on the candidate key.
1. First Normal Form –
• If a relation contain composite or multi-valued attribute, it violates
first normal form or a relation is in first normal form if it does not
contain any composite or multi-valued attribute. A relation is in first
normal form if every attribute in that relation is singled valued
attribute.
Example 1NF
• A table is in 2NF, only if a relation is in 1NF and meet all the rules, and
every non-key attribute is fully dependent on primary key.
As stated, the non-prime attributes i.e. StudentName and ProjectName should be functionally dependent on
part of a candidate key, to be Partial Dependent.
The StudentName can be determined by StudentID, which makes the relation Partial Dependent.
The ProjectName can be determined by ProjectID, which makes the relation Partial Dependent.
Therefore, the <StudentProject> relation violates the 2NF in Normalization and is considered a bad database
design.
Solution
• To remove Partial Dependency and violation on 2NF, decompose the
above tables −
3NF
• Although Second Normal Form (2NF) relations have less redundancy
than those in 1NF, they may still suffer from update anomalies. If we
update only one tuple and not the other, the database would be in an
inconsistent state.
• This update anomaly is caused by a transitive dependency. We need
to remove such dependencies by progressing to Third Normal Form
(3NF).
Third Normal Form (3NF):
FD set:
{STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE, STUDENT (STUD_NO, STUD_NAME, STUD_PHONE,
STUD_STATE -> STUD_COUNTRY, STUD_NO -> STUD_AGE} STUD_STATE, STUD_AGE)
STATE_COUNTRY (STATE, COUNTRY)
Candidate Key:
{STUD_NO}
Solution
For this relation in table 4, STUD_NO -> STUD_STATE and
STUD_STATE -> STUD_COUNTRY are true. So STUD_COUNTRY is
transitively dependent on STUD_NO. It violates the third
normal form. To convert it in third normal form, we will
decompose the relation STUDENT (STUD_NO, STUD_NAME,
STUD_PHONE, STUD_STATE, STUD_COUNTRY_STUD_AGE) as:
Convert into 3NF
Solution
• Super key in the table above:
• {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on
• Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
• That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
Solution
BCNF
• BCNF is the advance version of 3NF. It is stricter than 3NF.
• A table is in BCNF if every functional dependency X → Y, X is the super
key of the table.
• For BCNF, the table should be in 3NF, and for every FD, LHS is super
key.
Example: A company where employees work
in more than one department.
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys: