Notes On Normalization of Databases Normalization Is Due To E. F. Codd - Creator of The Relational Database Management
Notes On Normalization of Databases Normalization Is Due To E. F. Codd - Creator of The Relational Database Management
delete: when deleting a value for an attribute, you inadvertently lose the value for
some other attribute
insert: you need to store a value for a particular attribute but can't because you
need some other value to include that occurrence (don't have key value)
update: like insert but to change a value, you need to know all instances which
may be hard to find.
Codd originally specified three normal forms and showed that any or all of the
anomalies could occur in the various normal forms. Higher normal forms have since
been described by Codd and others. The process of normalization is that of creating
relations which are in a high enough normal form so as to avoid anomalies. Generally,
the higher the normal form the better.
The process of taking a relation and splitting it up into multiple relations is called
decomposition. The purpose of decomposing relations is to avoid anomalies.
key: a unique attribute (or field) which can be used to identify the entire tuple (or
record) as unique
candidate key: the set of all attributes (or combinations) which might serve as a key
primary key: key selected by the database administrator as the key we will use for
that relation
For example, Social Security Number determines last name so last name is
functionally dependent on SSN but the reverse is not true.
Sometimes you might ask yourself the question "If I know what X is do I
automatically know what Y will be?" If so, then X > Y (or simply said: "X determines
Y.")
A functional dependency diagram (FDD) is often helpful. Put each attribute in a block.
Dependencies are shown with arrows. (Don't confuse this with E-R diagrams.) Key
fields are usually shown on the left.
If you have multiple fields in a key (a composed key), each of the key attributes would
be enclosed in an outer box with dependency arrows from the appropriate block
(hopefully the outside one.)
transitive dependency: a dependency not involving any part of the primary key (here,
given that A is the key, the arrow from B to C there would indicate a transitive
dependency)
Dependency arrows can be used with this representation too but a designer has to be
diligent to avoid missing something.
The normal forms are hierarchical in nature -- each next higher form requires that all
lower forms exist. Decomposition should cause the resulting relations to achieve a
higher normal form. Here we will concern ourselves with four normal forms but there are
still higher forms which a relation may require to avoid anomalies.
A relation is in First Normal Form (1NF) if and only if all underlying values are atomic.
That means only one piece of data can be stored within the field (attribute) of a
particular record (tuple).
A relation is in Second Normal Form (2NF) if and only if it (a) is in 1NF and (b) every
non-key attribute is fully dependent on the primary key. Dependency on other
attributes may also occur but dependency on a part of the primary key is not sufficient.
A relation is in Third Normal Form (3NF) if and only if (a) it is in 2NF and (b) each
non-key attribute is not transitively dependent. An attribute not in the key can not
be determined by another attribute that is not in the key.
It may well be that a given relation could be decomposed properly in more than one
way. Invariably, one way will suit the current needs of the enterprise better than any
others. Choose the decomposition that is most useful.
can be decomposed into these two relations (a good decomposition – join on A),