Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26
Normalization
Normalization is a process used to organize
a database into tables and columns to minimize redundancy and dependency. The goal is to divide large tables into smaller, related tables and define relationships between them to avoid anomalies during data operations. Aim of Normalization: • Eliminate data redundancy: Avoid storing the same piece of data in multiple places, which can lead to inconsistency and wasted storage. • Prevent data anomalies: Ensure that data is consistent and reliable, avoiding update, insert, and delete anomalies. • Ensure data integrity: Make sure the relationships between data are maintained correctly through the use of keys and constraints. • Efficient organization: Break a large, complex table into smaller, more manageable tables while preserving relationships. • Improve query performance: Well-structured tables (without redundancy) reduce the size of the data, leading to faster querying and better performance. Types of Normal Forms • 1 NF • 2 NF • 3 NF • BCNF Unnormalized Table • An unnormalized table where we are storing orders, customer information, and product details in one table. This unnormalized structure can lead to redundancy and anomalies when inserting, updating, or deleting data. Problems in the unnormalized table: • Redundancy: John's phone number repeats in multiple rows. Similarly, supplier details repeat for every order. • Update Anomalies: If "ABC Corp" changes its phone number, we must update all rows where "ABC Corp" appears. • Insertion Anomalies: We can't insert a new customer without placing an order. • Deletion Anomalies: If we delete John's orders, we lose his contact details. First Normal Form (1NF)
• Goal of 1NF: Remove repeating groups and ensure atomic (indivisible)
values in each field. • What to do: • Ensure that each column contains atomic values (no sets or arrays). • Each row must be unique, identified by a primary key. • In this case, the table is already atomic (no repeating groups within columns), but it's still not well-structured because redundant data remains. First Normal Form • First Normal Form (1NF) is the first step in database normalization. A table is said to be in 1NF if it meets the following criteria: 1.All values in a table are atomic, meaning each column contains only indivisible values (no sets or lists). 2.Each record is unique, meaning there are no duplicate rows. 3.Each column must contain values of a single type. Example of a Table Not in 1NF • Problems: The PhoneNumbers column contains multiple values (a list of phone numbers for each student). • The Courses column contains multiple values (a list of courses for each student). Converting the Table to 1NF Second Normal Form (2NF) • Second Normal Form (2NF) builds upon the rules of First Normal Form (1NF) and ensures that the table is free from partial dependencies. • A table is in 2NF if: It is in 1NF. • No non-prime attribute (an attribute that is not part of a candidate key) is dependent on a part of a candidate key (i.e., no partial dependency). • What is Partial Dependency? • A partial dependency occurs when a non-prime attribute (i.e., an attribute that is not part of a candidate key) depends on only part of a composite primary key, rather than on the whole key. This problem arises only when the primary key is composite (made of more than one attribute). Step 1: Consider a Table with a Composite Key (Not in 2NF) Step 2: Identifying the Partial Dependencies • StudentName → StudentID: StudentName only depends on StudentID (not on CourseID), which is part of the composite key. This is a partial dependency because it doesn't depend on the whole composite key (StudentID, CourseID).
• Department → CourseID: Department depends on CourseID, not on
the whole composite key. This is also a partial dependency because Department is related only to the course and not the entire composite key (StudentID, CourseID). Step 3: Breaking Partial Dependency (Convert to 2NF) Third Normal Form (3NF) and Transitive Dependency • Third Normal Form (3NF) is a level of database normalization that ensures no transitive dependencies in the relation (table). A table is in 3NF if it satisfies the following conditions: 1.The table is in Second Normal Form (2NF). 2.There is no transitive dependency. What is a Transitive Dependency? • A transitive dependency occurs when a non-prime attribute (an attribute that is not part of the candidate key) depends on another non-prime attribute, which in turn depends on the primary key. • Formally: If A → B and B → C, then A → C is a transitive dependency. Step 1: Example of a Table with Transitive Dependency Step 2: Breaking Transitive Dependency to Achieve 3NF • Remove the transitive dependency, we need to split the table into two tables: 1.Students Table (Containing data related to students only) 2.Courses Table (Containing data related to courses) • In this table: StudentID → CourseID (because StudentID determines CourseID) • CourseID → CourseName and CourseID → InstructorName (CourseID determines CourseName and InstructorName) • This introduces a transitive dependency: StudentID → CourseID and CourseID → CourseName, so StudentID → CourseName is a transitive dependency. Table in 1NF: Second Normal Form (2NF) • Goal of 2NF: Remove partial dependencies by ensuring that every non-key attribute is fully functionally dependent on the primary key. • Steps: 1.The table should be in 1NF. 2.Remove partial dependencies, i.e., move data that doesn't fully depend on the primary key to separate tables. • In this table: • CustomerName and CustomerPhone depend on the customer, not the order. • Supplier and SupplierPhone depend on the supplier, not the order.