Database normalization is the process of organizing data in a database to reduce data redundancy and improve data integrity. It involves dividing large tables with anomalies into smaller well-structured tables and defining relationships between them. The objectives of normalization include removing data anomalies, minimizing redesign when extending the database structure, making the data model more informative to users, and avoiding bias towards any particular querying pattern. Edgar Codd, inventor of the relational model, introduced the concept of normalization and the first normal form in 1970.
Download as DOC, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
39 views
Database Normalization
Database normalization is the process of organizing data in a database to reduce data redundancy and improve data integrity. It involves dividing large tables with anomalies into smaller well-structured tables and defining relationships between them. The objectives of normalization include removing data anomalies, minimizing redesign when extending the database structure, making the data model more informative to users, and avoiding bias towards any particular querying pattern. Edgar Codd, inventor of the relational model, introduced the concept of normalization and the first normal form in 1970.
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 9
Database normalization
From Wikipedia, the free encyclopedia
Jump to: navigation, search In relational database design(RD!"#, the process of organi$ing data to minimi$e redundancy is called normalization% &he goal of database normali$ation is to decompose relations 'ith anomalies in order to produce smaller, 'ell(structured relations% )ormali$ation usually involves dividing large, badly(formed tables into smaller, 'ell(formed tables and defining relationships bet'een them% &he ob*ective is to isolate data so that additions, deletions, and modifications of a field can be made in *ust one table and then propagated through the rest of the database via the defined relationships% +dgar F% ,odd, the inventor of the relational model, introduced the concept of normali$ation and 'hat 'e no' kno' as the First )ormal Form (-)F# in -./0% 1-2 ,odd 'ent on to define the "econd )ormal Form (3)F# and &hird )ormal Form (4)F# in -./-, 132 and ,odd and Raymond F% oyce defined the oyce5,odd normal form (,)F# in -./6% 142 7igher normal forms 'ere defined by other theorists in subse8uent years, the most recent being the "i9th normal form (:)F# introduced by ,hris Date, 7ugh Dar'en, and )ikos ;orent$os in 3003% 162 Informally, a relational database table (the computeri$ed representation of a relation# is often described as <normali$ed< if it is in the &hird )ormal Form% 1=2 !ost 4)F tables are free of insertion, update, and deletion anomalies, i%e% in most cases 4)F tables adhere to ,)F, 6)F, and =)F (but typically not :)F#% > standard piece of database design guidance is that the designer should create a fully normali$ed design? selective denormali$ation can subse8uently be performed for performance reasons% 1:2 7o'ever, some modeling disciplines, such as the dimensional modeling approach to data 'arehouse design, e9plicitly recommend non(normali$ed designs, i%e% designs that in large part do not adhere to 4)F% 1/2 Contents - @b*ectives of normali$ation o -%- Free the database of modification anomalies o -%3 !inimi$e redesign 'hen e9tending the database structure o -%4 !ake the data model more informative to users o -%6 >void bias to'ards any particular pattern of 8uerying o -%= +9ample 3 ackground to normali$ation: definitions 4 )ormal forms 6 Denormali$ation o 6%- )on(first normal form ()FA or )-)F# = "ee also : )otes and references / Further reading B +9ternal links [edit] Objectives of normalization &his section needs additional citations for verification. Clease help improve this article by adding reliable references% Dnsourced material may be challenged and removed% (August 2010) > basic ob*ective of the first normal form defined by ,odd in -./0 'as to permit data to be 8ueried and manipulated using a <universal data sub(language< grounded in first(order logic% 1B2 ("E; is an e9ample of such a data sub(language, albeit one that ,odd regarded as seriously fla'ed%# 1.2 &he ob*ectives of normali$ation beyond -)F (First )ormal Form# 'ere stated as follo's by ,odd: -% &o free the collection of relations from undesirable insertion, update and deletion dependencies? 3% &o reduce the need for restructuring the collection of relations as ne' types of data are introduced, and thus increase the life span of application programs? 4% &o make the relational model more informative to users? 6% &o make the collection of relations neutral to the 8uery statistics, 'here these statistics are liable to change as time goes by% F+%F% ,odd, <Further )ormali$ation of the Data ase Relational !odel< 1-02 &he sections belo' give details of each of these ob*ectives% [edit] Free the database of modification anomalies >n update anomaly% +mployee =-. is sho'n as having different addresses on different records% >n insertion anomaly% Dntil the ne' faculty member, Dr% )e'some, is assigned to teach at least one course, his details cannot be recorded% > deletion anomaly% >ll information about Dr% Giddens is lost 'hen he temporarily ceases to be assigned to any courses% When an attempt is made to modify (update, insert into, or delete from# a table, undesired side(effects may follo'% )ot all tables can suffer from these side(effects? rather, the side( effects can only arise in tables that have not been sufficiently normali$ed% >n insufficiently normali$ed table might have one or more of the follo'ing characteristics: &he same information can be e9pressed on multiple ro's? therefore updates to the table may result in logical inconsistencies% For e9ample, each record in an <+mployeesH "kills< table might contain an +mployee ID, +mployee >ddress, and "kill? thus a change of address for a particular employee 'ill potentially need to be applied to multiple records (one for each of his skills#% If the update is not carried through successfullyFif, that is, the employeeHs address is updated on some records but not othersFthen the table is left in an inconsistent state% "pecifically, the table provides conflicting ans'ers to the 8uestion of 'hat this particular employeeHs address is% &his phenomenon is kno'n as an update anomaly% &here are circumstances in 'hich certain facts cannot be recorded at all% For e9ample, each record in a <Faculty and &heir ,ourses< table might contain a Faculty ID, Faculty )ame, Faculty 7ire Date, and ,ourse ,odeFthus 'e can record the details of any faculty member 'ho teaches at least one course, but 'e cannot record the details of a ne'ly(hired faculty member 'ho has not yet been assigned to teach any courses e9cept by setting the ,ourse ,ode to null% &his phenomenon is kno'n as an insertion anomaly% &here are circumstances in 'hich the deletion of data representing certain facts necessitates the deletion of data representing completely different facts% &he <Faculty and &heir ,ourses< table described in the previous e9ample suffers from this type of anomaly, for if a faculty member temporarily ceases to be assigned to any courses, 'e must delete the last of the records on 'hich that faculty member appears, effectively also deleting the faculty member% &his phenomenon is kno'n as a deletion anomaly% [edit] Minimize redesign hen e!tending the database structure When a fully normali$ed database structure is e9tended to allo' it to accommodate ne' types of data, the pre(e9isting aspects of the database structure can remain largely or entirely unchanged% >s a result, applications interacting 'ith the database are minimally affected% [edit] Ma"e the data model more informative to users )ormali$ed tables, and the relationship bet'een one normali$ed table and another, mirror real('orld concepts and their interrelationships% [edit] #void bias toards any particular pattern of $uerying )ormali$ed tables are suitable for general(purpose 8uerying% &his means any 8ueries against these tables, including future 8ueries 'hose details cannot be anticipated, are supported% In contrast, tables that are not normali$ed lend themselves to some types of 8ueries, but not others% For e9ample, consider an online bookseller 'hose customers maintain 'ishlists of books theyHd like to have% For the obvious, anticipated 8uery (( 'hat books does this customer 'antI (( itHs enough to store the customerHs 'ishlist in the table as, say, a homogeneous string of authors and titles% With this design, though, the database can ans'er only that one single 8uery% It cannot by itself ans'er interesting but unanticipated 8ueries: What is the most('ished(for bookI Which customers are interested in WWII espionageI 7o' does ;ord yron stack up against his contemporary poetsI >ns'ers to these 8uestions must come from special adaptive tools completely separate from the database% @ne tool might be soft'are 'ritten especially to handle such 8ueries% &his special adaptive soft'are has *ust one single purpose: in effect to normali$e the non(normali$ed field% Dnforeseen 8ueries can be ans'ered trivially, and entirely 'ithin the database frame'ork, 'ith a normali$ed table% [edit] %!ample Euerying and manipulating the data 'ithin an unnormali$ed data structure, such as the follo'ing non(-)F representation of customersH credit card transactions, involves more comple9ity than is really necessary: Customer &ransactions Jones &r. 'D Date #mount -3B.0 -6(@ct(3004 JB/ -3.06 -=(@ct(3004 J=0 Wilkins &r. 'D Date #mount -3B.B -6(@ct(3004 J3- "tevens &r. 'D Date #mount -3.0/ -=(@ct(3004 J-B -6.30 30()ov(3004 J/0 -=004 3/()ov(3004 J:0 &o each customer there corresponds a repeating group of transactions% &he automated evaluation of any 8uery relating to customersH transactions therefore 'ould broadly involve t'o stages: -% Dnpacking one or more customersH groups of transactions allo'ing the individual transactions in a group to be e9amined, and 3% Deriving a 8uery result based on the results of the first stage For e9ample, in order to find out the monetary sum of all transactions that occurred in @ctober 3004 for all customers, the system 'ould have to kno' that it must first unpack the Transactions group of each customer, then sum the Amounts of all transactions thus obtained 'here the Date of the transaction falls in @ctober 3004% @ne of ,oddHs important insights 'as that this structural comple9ity could al'ays be removed completely, leading to much greater po'er and fle9ibility in the 'ay 8ueries could be formulated (by users and applications# and evaluated (by the D!"#% &he normali$ed e8uivalent of the structure above 'ould look like this: Customer &r. 'D Date #mount Jones -3B.0 -6(@ct(3004 JB/ Jones -3.06 -=(@ct(3004 J=0 Wilkins -3B.B -6(@ct(3004 J3- "tevens -3.0/ -=(@ct(3004 J-B "tevens -6.30 30()ov(3004 J/0 "tevens -=004 3/()ov(3004 J:0 )o' each ro' represents an individual credit card transaction, and the D!" can obtain the ans'er of interest, simply by finding all ro's 'ith a Date falling in @ctober, and summing their >mounts% &he data structure places all of the values on an e8ual footing, e9posing each to the D!" directly, so each can potentially participate directly in 8ueries? 'hereas in the previous situation some values 'ere embedded in lo'er(level structures that had to be handled specially% >ccordingly, the normali$ed design lends itself to general(purpose 8uery processing, 'hereas the unnormali$ed design does not% [edit] (ac"ground to normalization) definitions Functional dependency In a given table, an attribute Y is said to have a functional dependency on a set of attributes X ('ritten X K Y# if and only if each X value is associated 'ith precisely one Y value% For e9ample, in an <+mployee< table that includes the attributes <+mployee ID< and <+mployee Date of irth<, the functional dependency L+mployee IDM K L+mployee Date of irthM 'ould hold% It follo's from the previous t'o sentences that each L+mployee IDM is associated 'ith precisely one L+mployee Date of irthM% In reality this 'ould not be the case since an L+mployee Date of irthM might be null and thus an L+mployee IDM might be associated 'ith no L+mployee Date of irthM, but it 'ould be the case that each L+mployee Date of irthM is associated 'ith precisely one L+mployee IDM% &rivial functional dependency > trivial functional dependency is a functional dependency of an attribute on a superset of itself% L+mployee ID, +mployee >ddressM K L+mployee >ddressM is trivial, as is L+mployee >ddressM K L+mployee >ddressM% Full functional dependency >n attribute is fully functionally dependent on a set of attributes N if it is functionally dependent on N, and not functionally dependent on any proper subset of N% L+mployee >ddressM has a functional dependency on L+mployee ID, "killM, but not a full functional dependency, because it is also dependent on L+mployee IDM% &ransitive dependency > transitive dependency is an indirect functional dependency, one in 'hich XKZ only by virtue of XKY and YKZ% !ultivalued dependency > multivalued dependency is a constraint according to 'hich the presence of certain ro's in a table implies the presence of certain other ro's% Join dependency > table T is sub*ect to a *oin dependency if T can al'ays be recreated by *oining multiple tables each having a subset of the attributes of T% "uperkey > superkey is a combination of attributes that can be used to uni8uely identify a database record% > table might have many superkeys% ,andidate key > candidate key is a special subset of superkeys that do not have any e9traneous information in them% +9amples: Imagine a table 'ith the fields O)ameP, O>geP, O"")P and OChone +9tensionP% &his table has many possible superkeys% &hree of these are O"")P, OChone +9tension, )ameP and O""), )ameP% @f those listed, only O"")P is a candidate key, as the others contain information not necessary to uni8uely identify records )on(prime attribute > non(prime attribute is an attribute that does not occur in any candidate key% +mployee >ddress 'ould be a non(prime attribute in the <+mployeesH "kills< table% Crimary key !ost D!"s re8uire a table to be defined as having a single uni8ue key, rather than a number of possible uni8ue keys% > primary key is a key 'hich the database designer has designated for this purpose% [edit] *ormal forms &he normal forms (abbrev% *F# of relational database theory provide criteria for determining a tableHs degree of vulnerability to logical inconsistencies and anomalies% &he higher the normal form applicable to a table, the less vulnerable it is to inconsistencies and anomalies% +ach table has a <highest normal form< (+*F#: by definition, a table al'ays meets the re8uirements of its 7)F and of all normal forms lo'er than its 7)F? also by definition, a table fails to meet the re8uirements of any normal form higher than its 7)F% &he normal forms are applicable to individual tables? to say that an entire database is in normal form n is to say that all of its tables are in normal form n% )e'comers to database design sometimes suppose that normali$ation proceeds in an iterative fashion, i%e% a -)F design is first normali$ed to 3)F, then to 4)F, and so on% &his is not an accurate description of ho' normali$ation typically 'orks% > sensibly designed table is likely to be in 4)F on the first attempt? furthermore, if it is 4)F, it is over'helmingly likely to have an 7)F of =)F% >chieving the <higher< normal forms (above 4)F# does not usually re8uire an e9tra e9penditure of effort on the part of the designer, because 4)F tables usually need no modification to meet the re8uirements of these higher normal forms% &he main normal forms are summari$ed belo'% *ormal form Defined by (rief definition First normal form (-)F# &'o versions: +%F% ,odd (-./0#, ,%J% Date (3004# 1--2 &able faithfully represents a relation and has no repeating groups "econd normal form (3)F# +%F% ,odd (-./-# 1-32 )o non(prime attribute in the table is functionally dependent on a proper subset of a candidate key &hird normal form (4)F# +%F% ,odd (-./-# 1-42 ? see Qalso ,arlo RanioloHs e8uivalent but differently(e9pressed definition (-.B3# 1-62 +very non(prime attribute is non( transitively dependent on every candidate key in the table oyce5,odd normal form (,)F# Raymond F% oyce and +%F% ,odd (-./6# 1-=2 +very non(trivial functional dependency in the table is a dependency on a superkey Fourth normal form (6)F# Ronald Fagin (-.//# 1-:2 +very non(trivial multivalued dependency in the table is a dependency on a superkey Fifth normal form (=)F# Ronald Fagin (-./.# 1-/2 +very non(trivial *oin dependency in the table is implied by the superkeys of the table DomainSkey normal form (DT)F# Ronald Fagin (-.B-# 1-B2 +very constraint on the table is a logical conse8uence of the tableHs domain constraints and key constraints "i9th normal form (:)F# ,%J% Date, 7ugh Dar'en, and )ikos ;orent$os (3003# 162 &able features no non(trivial *oin dependencies at all ('ith reference to generali$ed *oin operator# [edit] Denormalization Main article: Denormalization Databases intended for online transaction processing (@;&C# are typically more normali$ed than databases intended for online analytical processing (@;>C#% @;&C applications are characteri$ed by a high volume of small transactions such as updating a sales record at a supermarket checkout counter% &he e9pectation is that each transaction 'ill leave the database in a consistent state% y contrast, databases intended for @;>C operations are primarily <read mostly< databases% @;>C applications tend to e9tract historical data that has accumulated over a long period of time% For such databases, redundant or <denormali$ed< data may facilitate business intelligence applications% "pecifically, dimensional tables in a star schema often contain denormali$ed data% &he denormali$ed or redundant data must be carefully controlled during e9tract, transform, load (+&;# processing, and users should not be permitted to see the data until it is in a consistent state% &he normali$ed alternative to the star schema is the sno'flake schema% In many cases, the need for denormali$ation has 'aned as computers and RD!" soft'are have become more po'erful, but since data volumes have generally increased along 'ith hard'are and soft'are performance, @;>C databases often still use denormali$ed schemas% Denormali$ation is also used to improve performance on smaller computers as in computeri$ed cash(registers and mobile devices, since these may use the data for look(up only (e%g% price lookups#% Denormali$ation may also be used 'hen no RD!" e9ists for a platform (such as Calm#, or no changes are to be made to the data and a s'ift response is crucial% [edit] *on,first normal form -*F. or */*F0 In recognition that denormali$ation can be deliberate and useful, the non(first normal form is a definition of database designs 'hich do not conform to first normal form, by allo'ing <sets and sets of sets to be attribute domains< ("chek -.B3#% &he languages used to 8uery and manipulate data in the model must be e9tended accordingly to support such values% @ne 'ay of looking at this is to consider such structured values as being speciali$ed types of values (domains#, 'ith their o'n domain(specific languages% 7o'ever, 'hat is usually meant by non(-)F models is the approach in 'hich the relational model and the languages used to 8uery it are e9tended 'ith a general mechanism for such structure? for instance, the nested relational model supports the use of relations as domain values, by adding t'o additional operators (nest and unnest# to the relational algebra that can create and flatten nested relations, respectively% ,onsider the follo'ing table: First )ormal Form 1erson Favorite Color ob blue ob red Jane green Jane yello' Jane red >ssume a person has several favorite colors% @bviously, favorite colors consist of a set of colors modeled by the given table% &o transform a -)F into an )FA table a <nest< operator is re8uired 'hich e9tends the relational algebra of the higher normal forms% >pplying the <nest< operator to the -)F table yields the follo'ing )FA table: )on(First )ormal Form 1erson Favorite Colors ob Favorite Color blue red Jane Favorite Color green yello' red &o transform this )FA table back into a -)F an <unnest< operator is re8uired 'hich e9tends the relational algebra of the higher normal forms% &he unnest, in this case, 'ould make <colors< into its o'n table% >lthough <unnest< is the mathematical inverse to <nest<, the operator <nest< is not al'ays the mathematical inverse of <unnest<% >nother constraint re8uired is for the operators to be bi*ective, 'hich is covered by the Cartitioned )ormal Form (C)F#% [edit] 2ee also