0% found this document useful (0 votes)
39 views

Database Normalization

Database normalization is the process of organizing data in a database to reduce data redundancy and improve data integrity. It involves dividing large tables with anomalies into smaller well-structured tables and defining relationships between them. The objectives of normalization include removing data anomalies, minimizing redesign when extending the database structure, making the data model more informative to users, and avoiding bias towards any particular querying pattern. Edgar Codd, inventor of the relational model, introduced the concept of normalization and the first normal form in 1970.

Uploaded by

Shantel Mucheke
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Database Normalization

Database normalization is the process of organizing data in a database to reduce data redundancy and improve data integrity. It involves dividing large tables with anomalies into smaller well-structured tables and defining relationships between them. The objectives of normalization include removing data anomalies, minimizing redesign when extending the database structure, making the data model more informative to users, and avoiding bias towards any particular querying pattern. Edgar Codd, inventor of the relational model, introduced the concept of normalization and the first normal form in 1970.

Uploaded by

Shantel Mucheke
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 9

Database normalization

From Wikipedia, the free encyclopedia


Jump to: navigation, search
In relational database design(RD!"#, the process of organi$ing data to minimi$e
redundancy is called normalization% &he goal of database normali$ation is to decompose
relations 'ith anomalies in order to produce smaller, 'ell(structured relations% )ormali$ation
usually involves dividing large, badly(formed tables into smaller, 'ell(formed tables and
defining relationships bet'een them% &he ob*ective is to isolate data so that additions,
deletions, and modifications of a field can be made in *ust one table and then propagated
through the rest of the database via the defined relationships%
+dgar F% ,odd, the inventor of the relational model, introduced the concept of normali$ation
and 'hat 'e no' kno' as the First )ormal Form (-)F# in -./0%
1-2
,odd 'ent on to define
the "econd )ormal Form (3)F# and &hird )ormal Form (4)F# in -./-,
132
and ,odd and
Raymond F% oyce defined the oyce5,odd normal form (,)F# in -./6%
142
7igher normal
forms 'ere defined by other theorists in subse8uent years, the most recent being the "i9th
normal form (:)F# introduced by ,hris Date, 7ugh Dar'en, and )ikos ;orent$os in 3003%
162
Informally, a relational database table (the computeri$ed representation of a relation# is often
described as <normali$ed< if it is in the &hird )ormal Form%
1=2
!ost 4)F tables are free of
insertion, update, and deletion anomalies, i%e% in most cases 4)F tables adhere to ,)F, 6)F,
and =)F (but typically not :)F#%
> standard piece of database design guidance is that the designer should create a fully
normali$ed design? selective denormali$ation can subse8uently be performed for performance
reasons%
1:2
7o'ever, some modeling disciplines, such as the dimensional modeling approach
to data 'arehouse design, e9plicitly recommend non(normali$ed designs, i%e% designs that in
large part do not adhere to 4)F%
1/2
Contents
- @b*ectives of normali$ation
o -%- Free the database of modification anomalies
o -%3 !inimi$e redesign 'hen e9tending the database structure
o -%4 !ake the data model more informative to users
o -%6 >void bias to'ards any particular pattern of 8uerying
o -%= +9ample
3 ackground to normali$ation: definitions
4 )ormal forms
6 Denormali$ation
o 6%- )on(first normal form ()FA or )-)F#
= "ee also
: )otes and references
/ Further reading
B +9ternal links
[edit] Objectives of normalization
&his section needs additional citations for verification.
Clease help improve this article by adding reliable references% Dnsourced material may be
challenged and removed% (August 2010)
> basic ob*ective of the first normal form defined by ,odd in -./0 'as to permit data to be
8ueried and manipulated using a <universal data sub(language< grounded in first(order logic%
1B2
("E; is an e9ample of such a data sub(language, albeit one that ,odd regarded as seriously
fla'ed%#
1.2
&he ob*ectives of normali$ation beyond -)F (First )ormal Form# 'ere stated as follo's by
,odd:
-% &o free the collection of relations from undesirable insertion, update and
deletion dependencies?
3% &o reduce the need for restructuring the collection of relations as ne' types
of data are introduced, and thus increase the life span of application programs?
4% &o make the relational model more informative to users?
6% &o make the collection of relations neutral to the 8uery statistics, 'here
these statistics are liable to change as time goes by%
F+%F% ,odd, <Further )ormali$ation of the Data ase Relational !odel<
1-02
&he sections belo' give details of each of these ob*ectives%
[edit] Free the database of modification anomalies
>n update anomaly% +mployee =-. is sho'n as having different addresses on different
records%
>n insertion anomaly% Dntil the ne' faculty member, Dr% )e'some, is assigned to teach at
least one course, his details cannot be recorded%
> deletion anomaly% >ll information about Dr% Giddens is lost 'hen he temporarily ceases to
be assigned to any courses%
When an attempt is made to modify (update, insert into, or delete from# a table, undesired
side(effects may follo'% )ot all tables can suffer from these side(effects? rather, the side(
effects can only arise in tables that have not been sufficiently normali$ed% >n insufficiently
normali$ed table might have one or more of the follo'ing characteristics:
&he same information can be e9pressed on multiple ro's? therefore updates to the
table may result in logical inconsistencies% For e9ample, each record in an
<+mployeesH "kills< table might contain an +mployee ID, +mployee >ddress, and
"kill? thus a change of address for a particular employee 'ill potentially need to be
applied to multiple records (one for each of his skills#% If the update is not carried
through successfullyFif, that is, the employeeHs address is updated on some records
but not othersFthen the table is left in an inconsistent state% "pecifically, the table
provides conflicting ans'ers to the 8uestion of 'hat this particular employeeHs
address is% &his phenomenon is kno'n as an update anomaly%
&here are circumstances in 'hich certain facts cannot be recorded at all% For e9ample,
each record in a <Faculty and &heir ,ourses< table might contain a Faculty ID,
Faculty )ame, Faculty 7ire Date, and ,ourse ,odeFthus 'e can record the details
of any faculty member 'ho teaches at least one course, but 'e cannot record the
details of a ne'ly(hired faculty member 'ho has not yet been assigned to teach any
courses e9cept by setting the ,ourse ,ode to null% &his phenomenon is kno'n as an
insertion anomaly%
&here are circumstances in 'hich the deletion of data representing certain facts
necessitates the deletion of data representing completely different facts% &he <Faculty
and &heir ,ourses< table described in the previous e9ample suffers from this type of
anomaly, for if a faculty member temporarily ceases to be assigned to any courses, 'e
must delete the last of the records on 'hich that faculty member appears, effectively
also deleting the faculty member% &his phenomenon is kno'n as a deletion anomaly%
[edit] Minimize redesign hen e!tending the database structure
When a fully normali$ed database structure is e9tended to allo' it to accommodate ne' types
of data, the pre(e9isting aspects of the database structure can remain largely or entirely
unchanged% >s a result, applications interacting 'ith the database are minimally affected%
[edit] Ma"e the data model more informative to users
)ormali$ed tables, and the relationship bet'een one normali$ed table and another, mirror
real('orld concepts and their interrelationships%
[edit] #void bias toards any particular pattern of $uerying
)ormali$ed tables are suitable for general(purpose 8uerying% &his means any 8ueries against
these tables, including future 8ueries 'hose details cannot be anticipated, are supported% In
contrast, tables that are not normali$ed lend themselves to some types of 8ueries, but not
others%
For e9ample, consider an online bookseller 'hose customers maintain 'ishlists of books
theyHd like to have% For the obvious, anticipated 8uery (( 'hat books does this customer 'antI
(( itHs enough to store the customerHs 'ishlist in the table as, say, a homogeneous string of
authors and titles%
With this design, though, the database can ans'er only that one single 8uery% It cannot by
itself ans'er interesting but unanticipated 8ueries: What is the most('ished(for bookI Which
customers are interested in WWII espionageI 7o' does ;ord yron stack up against his
contemporary poetsI >ns'ers to these 8uestions must come from special adaptive tools
completely separate from the database% @ne tool might be soft'are 'ritten especially to
handle such 8ueries% &his special adaptive soft'are has *ust one single purpose: in effect to
normali$e the non(normali$ed field%
Dnforeseen 8ueries can be ans'ered trivially, and entirely 'ithin the database frame'ork,
'ith a normali$ed table%
[edit] %!ample
Euerying and manipulating the data 'ithin an unnormali$ed data structure, such as the
follo'ing non(-)F representation of customersH credit card transactions, involves more
comple9ity than is really necessary:
Customer &ransactions
Jones
&r. 'D Date #mount
-3B.0 -6(@ct(3004 JB/
-3.06 -=(@ct(3004 J=0
Wilkins
&r. 'D Date #mount
-3B.B -6(@ct(3004 J3-
"tevens
&r. 'D Date #mount
-3.0/ -=(@ct(3004 J-B
-6.30 30()ov(3004 J/0
-=004 3/()ov(3004 J:0
&o each customer there corresponds a repeating group of transactions% &he automated
evaluation of any 8uery relating to customersH transactions therefore 'ould broadly involve
t'o stages:
-% Dnpacking one or more customersH groups of transactions allo'ing the individual
transactions in a group to be e9amined, and
3% Deriving a 8uery result based on the results of the first stage
For e9ample, in order to find out the monetary sum of all transactions that occurred in
@ctober 3004 for all customers, the system 'ould have to kno' that it must first unpack the
Transactions group of each customer, then sum the Amounts of all transactions thus obtained
'here the Date of the transaction falls in @ctober 3004%
@ne of ,oddHs important insights 'as that this structural comple9ity could al'ays be
removed completely, leading to much greater po'er and fle9ibility in the 'ay 8ueries could
be formulated (by users and applications# and evaluated (by the D!"#% &he normali$ed
e8uivalent of the structure above 'ould look like this:
Customer &r. 'D Date #mount
Jones -3B.0 -6(@ct(3004 JB/
Jones -3.06 -=(@ct(3004 J=0
Wilkins -3B.B -6(@ct(3004 J3-
"tevens -3.0/ -=(@ct(3004 J-B
"tevens -6.30 30()ov(3004 J/0
"tevens -=004 3/()ov(3004 J:0
)o' each ro' represents an individual credit card transaction, and the D!" can obtain the
ans'er of interest, simply by finding all ro's 'ith a Date falling in @ctober, and summing
their >mounts% &he data structure places all of the values on an e8ual footing, e9posing each
to the D!" directly, so each can potentially participate directly in 8ueries? 'hereas in the
previous situation some values 'ere embedded in lo'er(level structures that had to be
handled specially% >ccordingly, the normali$ed design lends itself to general(purpose 8uery
processing, 'hereas the unnormali$ed design does not%
[edit] (ac"ground to normalization) definitions
Functional dependency
In a given table, an attribute Y is said to have a functional dependency on a set of
attributes X ('ritten X K Y# if and only if each X value is associated 'ith precisely
one Y value% For e9ample, in an <+mployee< table that includes the attributes
<+mployee ID< and <+mployee Date of irth<, the functional dependency L+mployee
IDM K L+mployee Date of irthM 'ould hold% It follo's from the previous t'o
sentences that each L+mployee IDM is associated 'ith precisely one L+mployee Date
of irthM% In reality this 'ould not be the case since an L+mployee Date of irthM
might be null and thus an L+mployee IDM might be associated 'ith no L+mployee
Date of irthM, but it 'ould be the case that each L+mployee Date of irthM is
associated 'ith precisely one L+mployee IDM%
&rivial functional dependency
> trivial functional dependency is a functional dependency of an attribute on a
superset of itself% L+mployee ID, +mployee >ddressM K L+mployee >ddressM is
trivial, as is L+mployee >ddressM K L+mployee >ddressM%
Full functional dependency
>n attribute is fully functionally dependent on a set of attributes N if it is
functionally dependent on N, and
not functionally dependent on any proper subset of N% L+mployee >ddressM
has a functional dependency on L+mployee ID, "killM, but not a full functional
dependency, because it is also dependent on L+mployee IDM%
&ransitive dependency
> transitive dependency is an indirect functional dependency, one in 'hich XKZ only
by virtue of XKY and YKZ%
!ultivalued dependency
> multivalued dependency is a constraint according to 'hich the presence of certain
ro's in a table implies the presence of certain other ro's%
Join dependency
> table T is sub*ect to a *oin dependency if T can al'ays be recreated by *oining
multiple tables each having a subset of the attributes of T%
"uperkey
> superkey is a combination of attributes that can be used to uni8uely identify a
database record% > table might have many superkeys%
,andidate key
> candidate key is a special subset of superkeys that do not have any e9traneous
information in them%
+9amples: Imagine a table 'ith the fields O)ameP, O>geP, O"")P and OChone +9tensionP%
&his table has many possible superkeys% &hree of these are O"")P, OChone +9tension,
)ameP and O""), )ameP% @f those listed, only O"")P is a candidate key, as the others
contain information not necessary to uni8uely identify records
)on(prime attribute
> non(prime attribute is an attribute that does not occur in any candidate key%
+mployee >ddress 'ould be a non(prime attribute in the <+mployeesH "kills< table%
Crimary key
!ost D!"s re8uire a table to be defined as having a single uni8ue key, rather than a
number of possible uni8ue keys% > primary key is a key 'hich the database designer
has designated for this purpose%
[edit] *ormal forms
&he normal forms (abbrev% *F# of relational database theory provide criteria for determining
a tableHs degree of vulnerability to logical inconsistencies and anomalies% &he higher the
normal form applicable to a table, the less vulnerable it is to inconsistencies and anomalies%
+ach table has a <highest normal form< (+*F#: by definition, a table al'ays meets the
re8uirements of its 7)F and of all normal forms lo'er than its 7)F? also by definition, a
table fails to meet the re8uirements of any normal form higher than its 7)F%
&he normal forms are applicable to individual tables? to say that an entire database is in
normal form n is to say that all of its tables are in normal form n%
)e'comers to database design sometimes suppose that normali$ation proceeds in an iterative
fashion, i%e% a -)F design is first normali$ed to 3)F, then to 4)F, and so on% &his is not an
accurate description of ho' normali$ation typically 'orks% > sensibly designed table is likely
to be in 4)F on the first attempt? furthermore, if it is 4)F, it is over'helmingly likely to have
an 7)F of =)F% >chieving the <higher< normal forms (above 4)F# does not usually re8uire
an e9tra e9penditure of effort on the part of the designer, because 4)F tables usually need no
modification to meet the re8uirements of these higher normal forms%
&he main normal forms are summari$ed belo'%
*ormal form Defined by (rief definition
First normal form
(-)F#
&'o versions: +%F% ,odd (-./0#,
,%J% Date (3004#
1--2
&able faithfully represents a relation
and has no repeating groups
"econd normal
form (3)F#
+%F% ,odd (-./-#
1-32
)o non(prime attribute in the table is
functionally dependent on a proper
subset of a candidate key
&hird normal form
(4)F#
+%F% ,odd (-./-#
1-42
? see Qalso
,arlo RanioloHs e8uivalent but
differently(e9pressed definition
(-.B3#
1-62
+very non(prime attribute is non(
transitively dependent on every
candidate key in the table
oyce5,odd
normal form
(,)F#
Raymond F% oyce and +%F% ,odd
(-./6#
1-=2
+very non(trivial functional
dependency in the table is a
dependency on a superkey
Fourth normal
form (6)F#
Ronald Fagin (-.//#
1-:2
+very non(trivial multivalued
dependency in the table is a
dependency on a superkey
Fifth normal form
(=)F#
Ronald Fagin (-./.#
1-/2
+very non(trivial *oin dependency in
the table is implied by the superkeys
of the table
DomainSkey
normal form
(DT)F#
Ronald Fagin (-.B-#
1-B2
+very constraint on the table is a
logical conse8uence of the tableHs
domain constraints and key
constraints
"i9th normal form
(:)F#
,%J% Date, 7ugh Dar'en, and )ikos
;orent$os (3003#
162
&able features no non(trivial *oin
dependencies at all ('ith reference to
generali$ed *oin operator#
[edit] Denormalization
Main article: Denormalization
Databases intended for online transaction processing (@;&C# are typically more normali$ed
than databases intended for online analytical processing (@;>C#% @;&C applications are
characteri$ed by a high volume of small transactions such as updating a sales record at a
supermarket checkout counter% &he e9pectation is that each transaction 'ill leave the database
in a consistent state% y contrast, databases intended for @;>C operations are primarily <read
mostly< databases% @;>C applications tend to e9tract historical data that has accumulated
over a long period of time% For such databases, redundant or <denormali$ed< data may
facilitate business intelligence applications% "pecifically, dimensional tables in a star schema
often contain denormali$ed data% &he denormali$ed or redundant data must be carefully
controlled during e9tract, transform, load (+&;# processing, and users should not be
permitted to see the data until it is in a consistent state% &he normali$ed alternative to the star
schema is the sno'flake schema% In many cases, the need for denormali$ation has 'aned as
computers and RD!" soft'are have become more po'erful, but since data volumes have
generally increased along 'ith hard'are and soft'are performance, @;>C databases often
still use denormali$ed schemas%
Denormali$ation is also used to improve performance on smaller computers as in
computeri$ed cash(registers and mobile devices, since these may use the data for look(up
only (e%g% price lookups#% Denormali$ation may also be used 'hen no RD!" e9ists for a
platform (such as Calm#, or no changes are to be made to the data and a s'ift response is
crucial%
[edit] *on,first normal form -*F. or */*F0
In recognition that denormali$ation can be deliberate and useful, the non(first normal form is
a definition of database designs 'hich do not conform to first normal form, by allo'ing <sets
and sets of sets to be attribute domains< ("chek -.B3#% &he languages used to 8uery and
manipulate data in the model must be e9tended accordingly to support such values%
@ne 'ay of looking at this is to consider such structured values as being speciali$ed types of
values (domains#, 'ith their o'n domain(specific languages% 7o'ever, 'hat is usually meant
by non(-)F models is the approach in 'hich the relational model and the languages used to
8uery it are e9tended 'ith a general mechanism for such structure? for instance, the nested
relational model supports the use of relations as domain values, by adding t'o additional
operators (nest and unnest# to the relational algebra that can create and flatten nested
relations, respectively%
,onsider the follo'ing table:
First )ormal Form
1erson Favorite Color
ob blue
ob red
Jane green
Jane yello'
Jane red
>ssume a person has several favorite colors% @bviously, favorite colors consist of a set of
colors modeled by the given table% &o transform a -)F into an )FA table a <nest< operator is
re8uired 'hich e9tends the relational algebra of the higher normal forms% >pplying the <nest<
operator to the -)F table yields the follo'ing )FA table:
)on(First )ormal Form
1erson Favorite Colors
ob
Favorite Color
blue
red
Jane
Favorite Color
green
yello'
red
&o transform this )FA table back into a -)F an <unnest< operator is re8uired 'hich e9tends
the relational algebra of the higher normal forms% &he unnest, in this case, 'ould make
<colors< into its o'n table%
>lthough <unnest< is the mathematical inverse to <nest<, the operator <nest< is not al'ays the
mathematical inverse of <unnest<% >nother constraint re8uired is for the operators to be
bi*ective, 'hich is covered by the Cartitioned )ormal Form (C)F#%
[edit] 2ee also

You might also like