A State of The Art Review of Distributed Database Technology
A State of The Art Review of Distributed Database Technology
of
DISTRIBUTED DATABASE TECHNOLOGY
October 5, 1992
Prepared for:
Rome Laboratory
RLIC3CB
Griffiss AFB, NY 13441-5700
Prepared by:
-.
DOCUMENTATJON
REPORT PAGE Farm
OP.UNQ. 07U-Or&?
AopDW
u m ~ - ~ ~ l O l ~ W - . ~ ~ Z I . ~ ~ n - . r ~ - ~ - -
rn-mr-rs-rr--- L . . l - m ~ m - ~ ~ ~ - r U - r - k - r a l r r ,
b - m - m r r r r r p r c - h l . 3 r ~ - u - C L D I * . n C . l ~ 1 7 1 *
~ ~ - U - C ~ P I - ~ A ~ - U - I ~ I ~ - S = ~ . ---s-- aX*.-rn-=--
1 Carol Wawrzusin
C t
I
7. P E R M R U M OCKiANlUTlON H I Y E ( S ) A H O A O C R E S ( E S ) 1. PEF?FORUrY; O R W I U T I O N
R E M WUBER
Kaman Sciences Corporation
258 Genesee Street
Utica, NY 13502
L
i E N C l PUYE(S)AHOA O M E f S E S l
9. ~ R 3 H b Q ~ N O U l r C T O l l t fAfG
10. S W N S O R I K i ~ I f f Ai C E m
Sponsoring Org. Monitoring Org. REPORT HUUBER
Defense Technical Info. Ctr. Rome Laboratory
DTIC/AI, Cameron Station RL/C3C N/A
Alexandria, VA 22304 Griffiss AFB, NY 13441
11. SUPPLEMENTARY M T E S
I
I
I Available from: Data & Analysis Center for Software
P. 0 . Box 120
Utica, NY 13503
1 tr O l S T F I I E ~ A V A N A & U I TS A T E Y E N T l a . DlSTRIBUmN CGOC
H%
OF R E W R T
Unclassified
7YO<n-ZUUm
OF THIS PAGE
Unclassified 1 O F A G W
Unclassified
aUmuU---
-.l*rd)&LIC1.
A STATE OF THE ART REVIEW
of
DISTRIBUTED DATABASE TECHNOLOGY
October 5, 1992
Prepared for:
Rome Laboratory
RL/C3CB
Griffiss AFB, NY 13441-5700
Prepared by:
1. INTRODUCTION
Duruig tlie past twellty years, tlie practice of organizulg repositories of data ullder a celitral point of co~itrol
beca~lleconiinonplace. In an effort to overcoil~ethe ull~~la~iageable sihlation of applicatiolls ge~leratuigand
~llailitai~iillg auto~lolllousfiles of data, corporations orgaliized their ciata illto "centralizeci" databases, free of
Juplicatiolis and uiconsistencies, ~ila~laged by a database llla~lageliielltsyste~ii(DBMS) u~lderthe colltrol of a
ce~ltraldatabase adniulistrator. Guarcied by the MIS mulions, this practice gave ~tlaliage~ile~it a secure llold over
corporate data, and at the saliie tune gave the company's diverse iliterllal 01-ganizationsthe illusioli of being in
c o ~ ~ t rof
o l their own data. For tlie corporate decisioli makers, gatlleruig data was a "siiilple" process of querying
the databases residing on tlie cotl3orate nlaiiifranie.
In receiit years the pllysical niakeup of col-porations, private elltities and gover~litlelltactivities, has cllaligeJ
from a celltralized to a Jist~ibutedstructure. The rise of giant co~iglo~~lerates is an example of this change.
Witlllli 011e of tliese co~llpositeorga~lizatioils,eve11 tlie ~lia~lufacturi~ig,
e~lguieerulg,aiid busi~lessunits of olie
co~llyolie~~t call be geographically dispersed. The distributed pliysical arcllitecture of tllese orga~lizatio~ls
de~lia~ids that the i~lfonllatio~l architecture also be ciistributed, e~llbraclligtlie coi~ceptsof open systen~s,
distributed coalputlllg, and liardware a ~ l dsoftware independence. As a natural consequence of this distributio~l
lies the require~llelltto distribute and i~lallagetlie corporate data. Evolvlllg ti-om this need, the coiillue~lceof
database tecl~liologyarid ~ietworktecl~~iology Iias now produced oiie of tlie rlewest l ~ l e ~ ~ i bof
e r ssoftware
engineering tecli~~ology called distributed elatabase tecluiology.
A dishibuttld clritcibast>is a collectioii of tnultiple, logically lllterrelated databases distributed over a coit~puter
network. A distributed database lliallagelilelit syste111 (DDBMS) is a software syste~ilthat penilits tlie
iria~iageillelltof distributed data making the distributio~ltrallspareilt to the user. A distributed database is lllore
reliable and illore respo~isivetllaii a ce~ltrallylocated a i d controlleci ciatabase; data can be e~iteredwhere it is
geaerated, data at different sites car1 be shared, and data call be replicated giving users tlie optioil of accessulg
copies of the ciata in tlie event of a site or letw work failure. Outgrowlllg data storage resources or c o ~ l ~ p u t u ~ g
power doeai't ~lecessitate~liovingup to the next expe~isivemai~lframe;ciistributed database teclulology allows
a!Tordable, itlcreiiieiltal harckare growth.
As witli all new technology, tlie definition of a distributed database was ullclear until tilile and use brouglit
clarification. During tllz early eighties, ve~idorsselluig "clistributed" ciatabases, users wlio felt a ~ieedto
iniplenient sucll a beast, and tlleorists wlio wrote articles ciealllig wit11 the topic for tecll~licalpublicatio~isall Ilad
tlleir ow11ideas about what constihlteci a "distributed" database.
In the Su~iu~ler1987 publicatio~iof InfoDB, C.J. Date proposed twelve sules that apply to a distributed database.
Like E.E Codd's fa~ilousrules for tlie relatio~ialniociel, Date's have beco~llea bible for distributed database
teclinology. Sunuilarized, the rules are as follows:
o SiteAlctoiioinv-Each site ~llai~ltaulslocal privacy and control of its ow11data; users that c o ~ i u ~ ~share
o~~ly
data can have it located at the site wllere they work
o No Ct~iitr(i1Sift>-The operation of the clatabase does not depzild on ally si~lglesite; each site ill the
letw work ru~lslocal applicatio~isll~depe~~de~itly
of the otller sites, or globally on clata at re~ilotesites;
no single DBMS is 111orellecessa~ytllali ally other
o E(liiq~art~iicv-Tlle
locatio~lof the data does not 11eed to be known to applications or users
o Rc.ylit.ritwii hidt~pii&iic.t~-Replicati011
of data sl~ouidbe u~da~own
to tlie user and updates to replicated
data are pe1fO1111ed tra~lspare~ltly to the user
o Distributed Qutv Proct~ssii~g-The performance of a quely sl~ouldbe i~ldepe~lde~lt of the site at wl~icl~
it is
subnlitted; interleaved tra~lsactio~lsupdating ~~iultiple
sites sllould be capable of serializatio~land, ~ I I
the event of failure, sl~ouldleave tlle database hi a co~isiste~lt
state
o Htirttwnre b~depeil&iict~-Theciatabase should be able to iategrate data from a wide variety of syste~ns
o Operatiiig Svstein hdepeiideilce -The database should be able to run 011 differelit operating syste~rls
o DBMS Iiicirynidt~1it.t~-Databases
~rlustbe able to co~ruriu~licate
with tllose of otlier ve~ldors
o All of tltr dlta residing at ally site X a i i d p r t c t i i i g iii tltegbbal d(ltrlbise nil1be trtwttld by tlre users at site X
in euict& tllr snintt way as f i t wew n bed datnbrisc. isolated froin the rest of the iiehvork
Correctly implementecl, ciistributed elatabases are more reliable, provide faster clata access, recluce
co~xu~~~~~~ load,
i c aand
tio~allow
l s for the increme~italupward scaling of hardware. AIIIOII~
tlie 111a1iy~ilotivatio~ls
for developll~ga ~Iistributedclatabase, tllese are the ~liostfreque~ltlye~~countered:
o a distributed orga~iizatio~lal
structure dealancis distributed data
A DDBMS is Iiomoge~ieousif tlie sallle DBMS occurs at eacli site regardless of the hardware and operating
system.[6] Generally, wlie~itlie ~llotivatio~l is to ultzgrate pre-existing databases, the "bottom-up" design
solution Il~volveslieterogel~eousdatabases - tllose belong~ngto several ve~idorsprobably ~ i obased t on tlie sallle
data model. I11 oilier situations, a "top-down" design call be used wllicli takes best adva~ltageof tlie
functionality of a distributed database. I11 zitller case, the database ciesig~ierwill need to know what tecl~~iology
is available to i ~ i ~ p l e r ~al distributed
e~~t database system. Most DBMS ve~ldorscurre~ltlyoffer a "distributed"
versio~iof tlleir product, but because of tlie lack of standards, tllese offerings vary in the level of support give11to
the various aspects of distributed database techiology. Also, depelldi~lgon the particular ltiarket served by a
vendor, so~ileaspects of distributed database tecli~~ology are e~lipliasizeciwhile otliers are ~~ii~illllized
or
IlOll -exiSte11t.
In order to select tlie right DDBMS or to develop an optlliiu~lldistributed design, tlie database syste111desigler
liiust understand tlie relative ~ i l e ~ iof
t s each feature and be able to make tradeoffs to effectively ~liatcli
i~i~plz~ller~ted
features to tlie specific data lieeds to be supported.
The objective of tliis state of tlie art review is to review tliose unique features of Jistributecl clatabases tliat
distinguisll t l l e ~ lfsom
~ ce~itralizecidatabases and to exaxamhle currently available inlplementatio~aof tllese
features.
2. THE STATE OF THE ART IN DISTRIBUTED DATABASE TECHNOLOGY
As in ce~itralizeddatabases, regardless of tlie ~i~iderl@ig data model, tlie fuadamental issue of distributeJ
databases is trni~spurr~~cy. hi a centralized clatabase, tralisparellcy refers olily to Jata Illdependence; hi a
distributed database, transpareacy refers to tlie Jata and to tlie ~ietwork.Accordi~igto Date's first rule, tlie
distributed database sliould appear to tlie user as one, u~lifieddatabase. To accoliiplisli this, not o~llythe locatio~i
of the data, but tlie very existence of tlie ~ietwork~iiustbe transparent to tlie user.
Tlie flipside of tlie transparency issue is the issue of local autononiy. Eacli site participating 111 a distributed
database ~iiustbe wliolly i~idependent;its operating system, administration, reside~~t databases and associated
catalogs must be totally autonomous.
The arcl~itectureof a distributed database per~iiitsa very large database to be supported on a collectio~iof host
equipment of varyitig capacities and perfor~iiat~celevels. Eacli participating site in a ~ietwork is a
general-purpose coniputer that executes both local applicatio~i p r o g r a ~ ~atid ~ s distributed database
~ i i a ~ ~ a g e functions.
~ ~ i e ~ i t Tliese computers range ui size fro111personal computers to powerful workstatio~~s
and
parallel computers.
One of tlie strong points of distributed database techllology is tlie ability to ll~cre~iie~itally add liost power to a
database structure. However, tlie desigu of tlie allocatio~~ of data to tlie liosts ~riustco~~sidertlie perfor~iia~ice
cliaracteristics of eacli host 111order to ellsure tlie ~iiostefticient operation of tlie distributed database system. A
frequently used portion of tlie database sl~ouldnot be allocated to a host with inferior perfor~iia~~ce
cliaracteristics, to a liost coruiected to an u~ireliablepower supply, or to a host on a poorly perfor~rillig,or
overloaded Local Area Network (LAN). Taki~~g advantage of tlie pelfor~iia~ice knproveme~itsoffered by
parallel architectures, recent tre~idshi distributed database tecli~iologyare toward assigning database fu~lctio~ls
to dedicated data servers wlierz the setvers are parallel processors. Tliese ~iiaclillizsare capable of l~ostlligvery
large (nlany gigabyte) databases, e ~ ~ l i a ~ ipel-foniiance
ci~~g by co~~curre~itexecution of parallel, coniplex queries,
and significantly reclucllig tlie VO bottle~ieckvia parallel disk accessing.
I
Local Network
DBMS Interface
Local Network
Global DDBMS
DBMS Interface
sm 2
Locd
Databasa
- LOCAL DBMS
Local Network
DBMS Interface
I
0 0
I END USERS
mAl-Dl
Databasa LOCAL DBMS USERS
END
h d i o andlor satellite broadcast ~ietworksare also being aiiployed as distributeel database tecli~iologygai~is
popularity.
Not part of tlie tecluiology of distributed databases per se, but cer-tauily witlilli tlie purview of tlie database
inlplenientor, are tlie co~i~pressio~i/deco~iipresio~i algoritli~iise~iiployedtlirougliout tlie co~lfiguratio~i of tlie
database. Tliis ilicludes an analysis of ally bridge, router,.and gateway liarciware and software that lilay be part of
tlie total system. As tlie ~iu~liberand types of liosts (and workstations) 011 a network, a ~ i dtlie ~lu~liber
of local area
~ietworks(LANs) coliiprisllig a systelii increase, tlie ability of the bridges, routers, and gateways to lialldle tlie
traffic efticiency is seriously affictecl.
Object -based logical ~iiodelsprovide flexible structuri~igcapabilities and allow tlie explicit specification of data
constraitits. Tlie entity-relationsliip (E-R) 11loJe1is aa exa~iipleof a11object based logical niodel. It liiodels tlie
real world as a collectio~iof objects, called e~itities,and tlie relatio~isbetweell tlieni. Sliow~iin Figure 2 is an E-R
~ilodelof customers, accounts, and tlie relatio~isliipof "customer account".
Recorcl-baseci logical ~iioclelsclescribe data at tlie conceptual and view levels, speclfyllig both tlie overall logical
structure a11c1tlie u~iple~~lelitatioli,
but liot tlie data constrahits. Tlie relatio~ialmodel, a record-based logical
rnodel, rzpreseilts data and the relatio~lsliipsbetweell data as a collectiorl of tables. Figure 3 sllows the custoiiler
account as a relatio~lalmodel.
Tlle iletwork ii~odel,ailotllzr recorci-based logical model, represzlits clata by a collectioil of records and
represeilts relatioilsllips ar~iollgdata by links. Figure 4 is a iietwork uiodel of the custoi~ieraccouiiit.
Tllere has beell researcli doile into the "universal" illode1 wllicli takes all the relatioils in a regular relatioilal
database and glues the~ntogetlier by means of orle operator (natural join) to forin a single relatiorl of very lligll
clzgree that contains all the ulforixlatioil 111 tlie cIatabase.[40] Tlie universal relatio~lalinodel aillls at acllizvllig
con~pleteaccess-path u~depei~cle~~ce in relatioilal elatabases by relieving the user of the rleed for logical
ilavigatioil among relatio~ls.Access paths are embeclcled 111attribute nanles, llidiilg all lllfonliatioil about the
logical structure of the database fi.0111the user. Altllougli rzlatiollal databases reir~ovedthe ileed for pllysical
navigation, access paths arrlollg relatiolls illust still be specified. The lllotivatioil belluid the urliversal relatioilal
ll~odelis to fully realize Codd's goal to free users Goirl tile ~leedto s y e c access
~ paths.
Lowe~y Maple Quee~is 900 55
A~iiongtlie current leadllig co~ite~iders for use witli clistributed databases are tlie record-based relatio~iala ~ i d
~ietwork~iiodelsand tlie object-oriented model. However, as it lias with ce~itralizeddatabases, the relatio~ial
ciata ~iiodelIias becoiiie tlie de facto sta~iclardfor DDBMS. E.E Codd, fou~iderof tlie relatio~ialmodel, Iiolds
tliat distributed clatabase technology is only feasible wlien based 011 a relational model. As cliaracterized by
Codd, tlie relational ~iiodelco~itallisshiiple data stnictures, provides a solid fouiidatioii for data consistency,
and allows set-oriented ~iia~iipulatio~is of relatio~ls.Tliese three powerful features Iiave propelled tlie
relatio~ialiiiodel to tlie forefront of tlie technology.
Tlie superiority of tlie relatio~ial~iiocielfor use hi ciistributed databases is refuted by recelit work doiie at the
Ger~iia~iNatioiial Researcli Ce~iterfor Coliiputer Scie~icewliicli espouses tlie use of an object-o~iented
database approacli to distributed database nlaiiageiiient.[4] A discussio~iof this effort is foulid hi sectio~i2.6.3.
It sliould be noted tliat it is possible to build a distributecl database syste~iiwitliout a slligle "global" data model.
Provicii~iga liigli degree of site auto~ioixiyby not eiiforcllig a global data ~xiodelor schema, tlie Sybase DDBMS
product supports ciistributecl operatioils via application progranmliiig or database-oriented remote proceclure
calls (RPCs) betweell Structured Quely L,anguage (SQL) Servers.[23] When multiple data ixiodels exist witliiii a
distributed database system, tlie syste~ii~iiustprovide for ~iiapplligfro111structures of one DBMS to a~iotlierand
for tra~islatlligtlie co~i~liia~ids of oiie DBMS's data ~iia~iipulatio~i language to tlieir equivalents in tlie data
iiiariipulatioii language of tlie otlier DBMS(s).[6] For example, Ingres' distributed yrociuct, Iiigres/STAR,
provides tliese functions via gateway products (restrictions apply to tlie locatio~iof tlie global ciata dictionaiy).
Also providing tra~isparelitjohi and view of ~iiultipledatabases, tlie Infor~iiix-STAR product i~icludesa11
extended sy~ionyliisfeature pzniiitti~iguse~sto e~xiploysyiioiiy~iisas polliters w1le11tables are ~iiovedbetweell
sites tlius freellig tlie~iifi-om tlie ~ieedto specify wliicli co~ilputerto access.
Wliatever sclie~iiaapproacll is chosen, users sliould be able to refer to and create tables by liallie witliout needi~ig
to hiow wllere in tlie system the table is pliysically located or llavi~igto be co~lcer~led
about ~iarillligco~lflicts.Tlie
ability of tlie elatabase to e~isureuriique syste111~ia~ilesis provided tl~ouglia catalog called tlie data dictionary.
Iafonnation about sites and storage structures, database sizes a ~ i dotlier statistics, access privileges,
fraguientation a ~ i dreplicatio~lof tables, and syste~il~ia~iiing co~ive~itio~lsare kept in a global data ciictio~iary
wliicll is itself a distributed database.
Tlie co~lceptualproblems associated with tllz sclle~ilaare e~iibodiedi11 tlie data dictionary. If it is kept at a si~lgle
site tliere tlie~iexists a slligle poitit of failure. If replicated at every nocte, tlie~leve~yclia~igeiri its itifor~ilatio~l
requires a change at every site. So~rieDDBMSs e~iiployan approach in wllicli each site ~ilai~itaitls its own local
catalog wliicli tlie syste~lisearclies for each reference to a table. Tliis ~iietliodsaves in the maintenance effort but
gellerates overliead ~ietworktraffic.
Tsa~lspare~icy refers to tlie separatio~iof tlie liiglier-level se~lia~itics of a syste~lifio111 its lower level
ir~~ple~ile~itatio~i
issues. It is tlie fu~ldallie~ltal
cliaraderistic of a distributed database, wit11 llie degree of
tra~lspare~icy being directly related to tlie degree of distribution. I11 order to acliieve a high degree of
transparency, tlie syste~iilliust autolintically record a~iciliiallitai~lillfor~iiatioliabout tlie locatioli of the data it1
the database, tlie status of transactions, failure of any site or conuiiu~iicatio~i lllik i11 tlie ~ietwork,and ~ilust
support co~illiiitand recovery protocols for e ~ i s u r i ~tra~lsadio~i
~g ato~nicity,isolation, and durability. Tliese
coticen~scan be divided uito four ~liajorissues pertahiitig to functionality: Jata location aud functioti
distribution, transaction management, and q u e ~ yprocessiug.
Data Jistributioli is the single fil~lctio~i u~lderthe co~itrolof the applicatio~lssyste111designer or database
administrator (the remauluig issues are ge~ierallyan i~itegralpart of the DDBMS). The two key issues of
designiig a distributed clatabase are it's data fragmentation and allocation. The yu~?>oseof fragnienting, or
breaking up, tlie tables of a ciatabase and allocatllig tlie~iito o ~ l or
e iiiore sites in tlie letw work is to llicrease tlie
perfor~iia~ice ancl/or reliability of applicatio~isusing the clatabase. The problena associated wit11 the allocatio~l
of tlie fragiients are slliiilar to tliose enco~uiteredUI allocating tiles to ~rodeson a co~iiputernetwork. Althougll
much of tlie researcli hito tile allocatio~~ call bee11 applied to fi-agnient allocation, tliere curre~ltlyare no
autorllated tools or allocatio~ialgoritl~rilsavailable to aici in evi~luatligalter~iativeallocatio~idesigns.
2.2.1.1 Fragmentation
Tlle ability to frr~pnttllt a relation, dividing it into subrelatio~isand allocatuig the subrelatio~isto a subset of
pal-ticipatu~gsites, is the clisti~lguislilligfeature of distributed database techiology. Slllce applicatio~isge~lerally
view subsets of a relation, subsets are ~iaturalunits of clistribution. Fragmentation per~ilitsthis filer gra~lularity
in the unit of data distribution.
Table 1 sliows tlie global relatio~lsllipProjects co~itaililllgsoftware project data. Witlill~Projects are tlie project
nuniber, tlie title of software project, its approxinlate dollar value, allel tlie perforlrilllg location.
Table Projects
12
Two exa~iiplesof possible liorizo~italpartitio~iirigsof tile global relatioosliip Projects are sliow~ibelow. I11 Tables
2(a), (b), and (c) the relatio~isliipis partitioiled alo~igtlie locatioii column. Tlie tliree res~lltlligfragliieiit~would
ideally be allocated to liosts at their respective site locations. This sclie~~iz places geograpliically relateci data
closest to tlie locatioii iiiost likely to be accessirig the ciata, a i ~ dreduces tlie processing requirecl wlieii joi~isare
executed agaiiist tlie Projects data at each site. Of course, aclditioiial co~~uiiuiiicatio~ls costs would be illcurred if,
for instance, tlie New Jersey site requirect access to tlie Alabama data.
New-Jersey-Projects =
select * fro111Projects wliere Locatio~i= 'New Jersey'
Alabama-Projects =
select * fro111 Projects wliere Locatio~i= 'Alabanla'
Illinois-Projects =
select * fro111Projects wliere Locatio~i= 'Illinois'
Tlie horizontal frag~nentationof Projects based 011 dollar value would be Jefllizd as follows:
Projects-Over =
select * froiii Projects wliere $Value > 200,000
Projects-Not-Over =
select * fro111 Projects wliere $Value < = 200,000
Vertical fragmentation partitions a relatio~linto s~iiallerrelatio~~s wit11 tlie goal of liill~llliizi~~g
t11e e x e c ~ ~ t itillie
o~i
of user applications on the fragiients. Joins pertbrliiecl 011the s~iiallerrelatio~iswill require ~ i i u cless
l ~ processing
tune. Tile co~lceptof vertical partitioning, clevelopcci witl~il~ tllz col~textof ceritralized databases for tile sarrle
reason, is mefill i11 distributed clatabases wllere eacli fragiient may colitai~~ data wit11 co~illiio~i geograpl~ical
properties. Table 3 sllows a vertical partitio~~llig of Projects. Note tliat the primary key of Projects, project-
nuniber, appears as tlie primary key of each vertical fragment.
Projects-Title-Locatio~i=
select Nunlber, Title, Locatio~ifro111 Projects
Projects-$Value =
select Nunlber, $Value fs0111 Projects
Keepuig hi 111i11cithat tlie pe~foniia~lceof query execution will be affected by tlie extent to wliich a database is
fragnented, tlie database ciesigler t l ~ i ~detenni~le
st tlie correct level of fsagrl~e~itatio~l
wllile ~liai~ltai~li~ig
tlie
following proper-ties:
I. Completeness - If a global relatio~lis deco~~~posed irlto fragnients, each data itel11that call be fouad ~ I I
the global relationship call also be foulid hi one or ~lioreof tlie fsagnents. Tliis property e~isuresagainst
loss of data.
2. Reco~lstructiot~ - It must always be possible to recollstruct a global relatio~lby joillillg the fragments
together. Holizo~ltalfrag~~ie~lts car1 be reco~liblliedby using tlie SQL UNION operator. In vertical
par-titionulg this is generally acco~~iplislied by including tlie key of the global relatio~~sllip 111 each
fragment piaranteeitlg tlie reco~lstructio~l througll a joi~l relationsliip. Tliis property et~surzsthat
co~lstrallltsdefilled on the clata hi the for111of dependencies are preserved.
3. Disjoi~ited~iess - If a global relation is llorizo~itallydeco~i~posed llito fiagnents, ally llidividual data itel11
call be found ill o~llyorle of the fragn~ents.Si~lcethe primary key attributes of a relati011 are typically
repeated in each of its vertical fragnents, disjoi~lted~less is defined o~ily011 the ~ l o ~ i p r i l ~key
a r yattributes
of a vertical fragmentation.
I11 a distributed database systen~,a query written against a fragmented database would look exactly like a query
writtell agallist a centralized database. However, suice no curre~ltDDBMS product supports fragnentatiou, the
user 111ustlulow liow the database is fragmented to be able to collstruct correct queries.
Table 4 sllows tlie relatiolisliip Staff occurri~lgin the database of software e~~gineenllg
projects. Staff coritailis
the employee ide~ltificatio~l number, name, position, and ~lunlberof currelit project assigui~ent.
Table Staff
16
To tilid out wliere progsanmiers are correctly workllig tlie SQL query agabist eitlier a distributecl or celitralized
database would be:
A true Jistributecl database systerli wo~lldautomatically expand tliis cluely to sol~ieequivalent of tlie following:
23.12 Replication
011ce fragtnentation lias beell completed, tlie i~idividualfragliients ~iiustbe allocated to various sites 011 the
network. A big decisioil for tlie distributed database desig~izris wlietlier ally or all of tlie fiagtnents sliould be
maintained at ~iioretlia~iO I I ~site. If no data is replicated, tlie syste~iiis referred to as partition4 if all the data
exists at evesy site, tlie systeiii is teniledfU& rt?pli('akd;if only sollie fiagtiielits exist at liiultiple sites tlie systeill is
called przrtir~llyrtylicatd.
Ai ophnal allocatioil of fraglnents ~ilustaddress botli tlie costs associated witli s t o ~ i ~mdtiple
ig copies arid tlie
perfor~iianceof tlie resulta~itsysteiii. Storing costs ~iiustjliclude tlie pliysical storage costs, tlie quesyllig costs,
and tlie updatuig costs, wlizre updating i~ivolvesco~icurre~icy colitrol ~iieclia~iis~lls
and integrity e~iforceirie~lts
across ~liultiplecopies of data.
Transactio~iorie~iteddatabase applicatiolis de~iiaiida liigli level of reliability alid availability. Witli i~iultiple
copies, tlie probability that so~iiecopy will be available even wlieli syste~iifailures occur is liigli. For read only
queries accessuig tlie sariie data, multiple copies provide an opportu~iityfor parallel execution. Wliile a syste~~i
with no data replication elimuiates tlie co~liplexitizsrelated to update syncllroiiization, reliability a ~ i d
perfor~iialicerequiseriie~itsliiay dictate either full replicatio~iof data at each site or varying degrees of pastial
replication. Altliougli partial replicatio~iilitroduces tlie additio~ialcost of rz~iioteaccesses, the cost is low wlie~i
coliipared with tlie costs associated with write operatioils iii a fully replicated situatioa.[37] Data place~iielitis
esse~itiallya trade-off betweeti update costs and tlie bellefits of lllcreased reliability and perfoniiance.
Tlie quely processor 111a ceiitralized DBMS tra~isforr~is higli-level queries hito equivale~itlower-level queries
wliicli lliipleiiie~ittlie exec~tio~i
strategy, focusi~igon opthnization of peifor~~ia~iceprirnaiily by reducing disk
accesses.
Distributed query processing lli~lstdeal with tlie analysis, optitnizatioa, and executioii of queries refere~icllig
distributed data. Query optll~lizatio~i aud execution 111 a distributed database e~iviro~i~iie~it uivolves global and
local optii~iizatioiiplans aiicl tlie selectio~iof access paths. Clioices coiiceriiuig tlie best site to process data and
liow data sliould be iiioved betwee11 sites ~iiakethe task of distributed query processilig significantly inore
co~iiplexthan tlie ce~itralizeclversion.
A distributed queiy optlliiizer deco~iiposesa query into a seque~iceof serial and parallel operations, groups tlie
operations tliat call be perforiiied at tlie saliie site, aiid stages tlie traiisiiiissio~iof results betwee11 sites to
eve~ituallyyield tlie desired result. Tlie dy~iamiciiature of a DDBMS adds to tlie coi~iplexityof tlie optlliiizer,
sllice each site ~iiustalso carry 011 its ow11local executio~iload, wliile tlie ~ietworkis subjected to vaxyi~igtraffic
patter~isand bottleiiecks. Optiliiizllig distributed queries i~ivolvesco~isideratio~i of tlie following:
o replicated/frag~ie~itzd
clata possibilities
Significant researcli lias occurred in tlie area of distributed query processing. Tlie results of this researcli call be
observed in tlie variety of i~iipleiiientationscurre~itlyfoulid ui co~iurlercialaiid researcli systei~is.Tlie researcli
empliasis lias been 011 fi~idllig~iietliodsthat iiiii~i~ilize
tlie costs associated with intersite conmiui~ication.I11 iiiost
cases, optllnization is broken llito two separate probleiia: selectio~iof a global execution strategy, based 011
i~itersittconununication, aiid selectio~iof eacli local executioii strategy, based 011 ce~itralizedquery processllig
algoritl~~lls.
[31]
Managing transactions ui a clistributecl database enviroa~nentrequires dealing with colicurrelicy control, systeni
reliability, atid tlie efficie~icyof tlie syste~iias a wliole. Tlie executio~iof tra~lsactio~ls~iiustbe do~ieui a way that
preserves tlie cliaracteristics of transactions, ~iii~ii~ilizes tlie cost, and ~liaxuiiizessystetii availability. Tlie
tra~isactio~i nianager ~liustprovide tlie syste~iiwith resiliency. Despite colxipo~ie~~t failures, tlie syste~xi~iiustbe
able to co~itiriueoperatio~isa ~ i de~isurethat database coiisiste~icyis not violatecl.
2.23.1 Concurrency Control
Concurrency colitrol is tlie ~xiostdifficult of tlie proble~iisfaced by distributed databases wlieri data redunclancy
is permitted. Generally, tlie techniques hi use today to maintain data corisiste~icywliile ~~illiiriiizirig
tliz overliead
of propagating cotitrol llifor~iiatio~i to all ~ioclesin tlie tietwork are extensions of one or botli of tlie sanie
techniques used hi centralized databases - locks and ti~iiestamps.Likewise, wlie~ilockirig is used as tlie ~iietliod
of syncllronization, cleadlock of tlie DDBMS call result. Well establislieci ~lietliodsof Jeacilock prevetition,
avoidance, and detectio~icall be applied to distributed database systems.
2.23.1.1 Locking
Locking, tlie sixiplest fonii of concurrency cotitrol to implement, is tlie ~xietl~od ~iiostusecl 111ce~itralizedDBMS
products. Those portio~isof a database irivolved ui a read or write operatioti are "locked", ~iladeunavailable for
ally otlier operations. Diffire~ices111DBMS products call be f o ~ l ~hi i dtlie granula~ityof tlie locks; products liiay
"lock" at tlie data iteni, tlie record, tlie page, tlie table or file, etc.
Wlie~iused hi a distributed database e~iviro~l~xient, the locki~~g ~iietliodresults i11 1o11gclelays wliile tlie lockuig
protocol is propagated to all tlie affected nodes, the tra~isactio~i is accomplished, a~icltlie acluiowledge~ieitsare
again propagated. For an "11" ~ i o d enetwork, straigl~tforwardlocking i~ivolves511 i~iter~iode liiessages to
accoiriplisl~o ~ i etra~isactio~i
as follows: 11 lock messages, 11 lock grant messages, 11 update messages, 11 update
ackt~owleelgnieats,and 11 release lock messages. Several variatio~isof locking, biclucii~i~ tlie popular "two pliase
conm~it",reduce tlie tiu~xiberof liiessages to 411, 311, a ~ i deven 1.511by usuig co~iceptssuch as ~xiajoritylocking,
wliere o~ilya majority of tlie nocles are required for a co~illxiitrather tlia~iu~ia~ijliiousapproval, and piggybackitig
upclate liiessages 011 top of lock requests, but all of tliese techniques prove to be unsatisfactory 111 situatio~is
i~ivolvi~iglarge ~iu~iibzrsof sites and liigl~tra~isactio~i
volumes.[40]
Aiotlier variation of locki~ig,tlie pri~iiary-site concept, i~ivolvesfunnelli~igall updates for give11 partitio~isof
tlie database through a pli~iiarysite. W ~ e requests
n for data co~ifor~ii
to well-defuied patterns, for irlstance, by
geograpliical location, tliz primary site tecll~iiqueis effective; however, w11e11requests call span ~~iultiple
pritliary
sites tliis techiique call result hi global database locki1ig.[40]
2.23.1.2 Timestamping
Alother c o ~ ~ c u r r e ~c ~o ~c y~ t r o~lletllod
l under Jevelopliie~~t today c ~ ~ ~ p l oseveral
ys different syllcllro~lizatio~l
techniques Jepe~~cii~lg on tlie tra~lsactio~l belllg executed. At syste111ciesig~ltime, after an a~lalysisof the ways iu
wl~iclltra~lsactio~ls call interfere wit11each other, several sy~~chronization protocols are establislied wllicll vary hi
cost according to the level of co~ltrolprovided. 3aasactions are identified as belo~lgi~lg to a class depe~ldlllgon
tile level of co~~curre~lcy c o ~ ~ t r required
ol to ~ilaultai~l
C O I I S ~ S At
~ ~ru11-time,
I I ~ ~ . tlle syste~lldoes a table
look-up to d e t e r ~ i ~ iwllicll
~ l e protocol to employ; if the tra~lsactio~l belo~igsto several classes, tlle syste~llcl~ooses
the 111ostefficient, it'it doesn't belo~lgto any, the syste~ili~llposesthe stro~lgestprotocol Jefuled. Tliis teclu~ique,
111IBM's e x p e ~ i ~ ~ ~R*
i~ilple~r~e~lted e r ~distributed
tal database system, is reporteel as providing the fastest, lowest
cost ~l~etl~oci
of concul~encyco~ltrolat this timz.[40]
2.2.3.1.4 Deadlock Management
Witllhi a distributed database e~lvironment,the database recovely manager ~ilustdeal wit11four types of failures:
trallsactio~lfailures, ~ilediafailures, site failures, and conu~iunicationfailures.
Transaction, n~edia,and site failures are COII~IIOII to both ce~ltralizedand distributed DBMSs. Tra~lsactio~l
failures, usually caused by an error 111tlie data or by the existence or pote~~tial for deadlock, are 11a11dledby
aborting the tra~lsactio~l and restoring the ciatabase to its state prior to the transaction. Media failures, wllicll
result in levels of data loss raagiig from co~i~pleteloss of tllz stable database andtor the ciatabase log to loss of
recent traasactions, are 111ost oftell repaireel by eitller a full restore tYom an arcllive copy or a restore
acco~l~plislled by redoi~lgand l~~ldouig trallsactio~~s
stored 11 the database log.
Uaique to clist~ibuteddatabases, co~lullu~licatio~l failures ge~lerallyare related to ~llessagesthat either co~ltaul
e~rors,are delivered out of S ~ ~ U ~orI Iare C ~lost.
, Tile lower tllree layers of the I~lter~latio~lal Sta~ldards
Organization's Ope11 Syste~llsI ~ ~ t e r c o ~ u(ISOtOSI)
~ect architecture are expected to Iia~lJletlie iisst two types of
liiessage related errors. Lost niessages, typically the result of colii~llu~licatio~i li~leor site failures, 111ust be
Ilanciled by the DDBMS. LI the event of conununication line failures the letw work may becorlle divided, hlow~las
yclrtitioi~c'n,a ~ l deach partiti011 111aycontu~ueoperatioo. Mallltalllu~gthe collsiste~lcyof a distributed database
across a partitio~lednetwork, especially if replicatio~lof data exists across the pa~titions,is a monumental task
for the clistribllted tra~isactio~l nlanager.
Protocols en~ployed111 reliability tecll~~iques ulclucie the conullit, terrniaate, a d recover protocols. Comelit and
recover protocols exist in ce~ltralizedDBMSs, but their u ~ i p l r ~ ~ ~ e ~dl it fa kt ii o~UI~ DDBMSs.
~ Mai~ltai~lulgthe
atoillicity of trallsactiolls across ~ ~ l ~ i l t isites
p l e u~ipliesthat if a tra~lsactio~l
fails at one site, it niust be aborted all
all otller sites. Ter~lli~iatio~lprotocols, unique to distributed databases, c o ~ ~ ~ p l erecovely ~ ~ ~ e ~protocols;
it while
recove~ydeals wit11 re-cstablisl~ing a co~lsiste~lt database across 111~11tiple sites, tennulation deals wit11
tennirlating active tra~~sactio~is wlle~ia failure has occurred at one or lliore sites. The co~luilitprotocol ~liust
ensure that the effects of a tra~lsactio~l across the eatise database is an all or ~lotlli~lg situatio~l.
Wlleil c o ~ ~ t e ~ ~ l p lelltry
a t i ~ ~illto
g a 11ew tecll~~ology domain, 1110st syste~llsdesigners survey tlie curre~lt
iinykintv~ttd state-of-the-art. Tlle ~llajorityof today's iinpki?zt~iltc~cf distributed databases are of the
l~eterogeneousvariety, liavlllg bee11 developed as a response to the prob1e111of ultegatllg databases scattered
tl~rougl~out organizations. AIIIOII~ tlie 111ostdocumented cases, all llaving bee11 ~uide~way for several years, are
Ge~leralMotors' DATAPLEX, Anoco's Ar~zocoDistnbutt~cfDairzbast~S\~strm(ADDS), Xerox's MULTIBASE,
allel the I r l i t p t t d Mi~iluji~ciurirzg Dat(1 Adilziilistr(ltio11S,vstclin (IMDAS) cieveloped to support the Natio~lal
Bureau of Sta~ldardsAutonlatecl Ma~iufacturi~lg Researcll Facility. Each of tllese syste~~ls
is a special purpose,
o ~ i eof a kuid systenl, customizecl to include tllose features of distributed database tecl~~~ology wliicll ~lieetthe
~ieedsof the organization.
o co~iu~~u~licatio~ls
protocols
o data n~odels
o data represe~ltatio~ls
o tra~lsactio~l
managenlent protocols
Withi11 tllese custo111syste~llscar1 be found sopllisticated solutio~lsto the problen~sof distributecl ciatabases.
However, tllese solutions are tailored to the needs of tlie particular imple~nentation.
Tlie ~liostdesirable distributed database design would be based on a si~igledata n~odel,preferably o ~ l ehi wllicll
each site iiilple~iieiltedtlie sa111edatabase iiia~iage~rle~itsystem. I11 this situatioti, all tlie proble~~ls
of disparate
sclle~ilallitegratio~iand query language tra~lslatio~i disappear. Because we harclly ever get to deal wit11 the
"perfect" situation, several corlllliercial distributed database products now suppor-t ~~lultiple data ~liodelsand
network protocols via gateway products. Altliougl~all tliese products use the rrlatiorial data ~~ioclel for tlie native
syste~ii,gateway products provide tlle~liwit11 [lie ability to hicorporate older liierarcliical and network databases
as nocles. Factors to be co~lsiciereclwlle~ibegi~i~lllig
the desigli of a distributed database illclude the following:
o the ~iietl~odology
employed to ge~leratetlie global sclieriia
2 3 4 Data Distribution
The aclvantages to be offireel by data distributio~i~liustbe fulfilled by the DDBMS. Offsetti~lgtlie pro~llisesof
i~nprovedpel-formance, reliability, and availability are tlie co~llplexitiesrelated to update syncllronization,
distribution of control, security and the ge~lerallack of experience dealing with distributed databases.
Tllerefore, w11e11the developnient of or ~iiigratio~i to distributed databases is contemplated, tlie degree of
clistributio~l ancl level of locatio~ltra~ispareticysupported by a DDBMS product are factors for serious
consideration.
I11 tlie design of a distributed clatabase, it 111ay be decided that the organization's structure, geographical
dispersion, or otlier data requireriie~~ts iiiay ~ieczssitateor leiid itself to the use of frag~rie~ltatiori and/or
replicatio~lof relations. Altliougl~tliere are pla~isfor it in every venclor's future, 110 distributed database product
curre~ltlysupports transparent l~orizo~ital and/or vertical fiagnentation. If tlie use of fragiientation is a
requirement, custo~ilsoftware ~iiustbe writtell to support the level of trac~sparellcyrequired.
0 1 1 tlie otlier hand, ~ilostdistributed clatabase products currently support replicated data for q u e ~ y
purposes;
with 1'112exception of two products (see sectio~l2.3.4), I~owever,tllese salile proclucts o~ilysuppor-t suigle site
update witlii~ia sfiigle tra~isaction.
Witlilll tlie co~itextof a DDBMS, locatio~itra~~spare~lcy boils down to ~la~iiulg trallspare~icy- providuig unique
uanles for each object 1l1the database. I~ilple~~le~ltatio~ls of tliis fii~ictionrange frotii requiring tlie user to provide
unique ~ia~lies to liavllig tlie systetii enibed site locatio~llianies witlii~itile ~ i a ~ l ofi e each database object.
Ell~beddi~ig locatio~lsui tlie object tia~llescan ~iiakeit ur~wieldyw11e11tliz uszr is requirzd to spec@ tlie fi~llnanie,
as in IBM's experimeiital R* systeiii. The enibeciduig practice causes other probleliis wlleli objects are liloved
across iiiaclillies for pei-foniiance optimization. Soiiie systeiils elect to eiiibed the "birtll" site iiaiiie i11 a11object's
liaiile, providing refereucuig functions witliul tlie systeni's data clictioliaiy that resolve tlie curreiit locatioil of tlie
object. Otlier svstei~isprovide an aliasing capability for long names. However impleiiie~iteci,tlie best solutioii is
for the systeni to provicle unique ulteri~alllaliies for database objects ailci to trailslate the user liaiiies to tliese
transparently.
Distributed query capability call be fouiid hi just about every distributed database product, with sigiiiiicant
diffireiices occurring 111 the query manager. So~iiecurreiit ciistributed DBMS products send queries to each
database, aiid tlieii coi~ipilethe results illto one respo~iseratlier tliaii Iiandoff tlle queiy to a distributed query
manager. Soiiie products co~itallisopliisticateci cost optllnizers.
Tlie locatio~lof tlie data clictioilary is a significant factor in query optimization. S o ~ ~systei~is
iz fully replicate the
dictionary at each site to expedite query processuig; otliers iiiaii~taiiia ce~itralizeciversion of tlie dictioiiaiy with
tlie e~~ipliasis on expeciitiiig updates. Depeiidiiig on tlie specific applicatioii to be imple~iiented,tlie designer
iiiust co~isidertlie dictionary's location(s). Usuig a product that suppor-ts a ceiitralized dictionary lias serious
lii~iitatioilsfor ail application witli llefty distributeci q u e ~ yrequirements.
Most query optlliiizers are tied to ciata transmission costs; a cost -based optiiiiizer reviews all possible seiiiijollis,
deter~iiuiesthe tiiiie aiid co~tuiiuiiicatioiisburdeii for eacli, aiid cliooses tlie least cost alter~iative.Tlie query
metlloci that itillllliiizes iietwork traffic is geiierally coilsidered tlie i~iostcost effective. Orie coiiuiiercial product,
Infonnix-STAR, lias a verbose feahlre that ulfonns tlie user of tlie costs i~ivolvedfor eacli SQL statement;
however, it oiily reveals the costs for tlie cliose~ialteniative, not for all tlie possibilities. Tlie Itlgres/STAR
product boasts tlie Illelustry's orily "intelligent" optlliiizer; it relies on database sailiplllig statistics arid lieuristics
to arrive at an optitiial query processuig strategy.
Sollie databases take acivaiitage of tlie parallelisili offered in distributed databases by coi~curreiitlyexecuti~ig
sub-queries at retiiote sites, aiid tlieii biiiigllig tlie data togetlier 111 soiiie optlltial iilariiler for final proczssi~ig.
Otlier products require tliat tlie processuig be perforined at tlie dataserver ~iearesttlie user; if tlie designer's
iletwork coiltailis datase~verswitli sigilficaiit yerforiiiaiice diffirences, this is a serious concerii.
Distributed traiisactioii itiaiiageiileiit deals with tlie probleiii of concurrency - sy~iclironizi~ig traiisactiolis that
update redundantly stored ciata. Traiisactioll iiiaiiageiiieilt protocols liaiidle tlie coiiuiiit/abort decision at each
site in tlie distributed database. Fully iiiiplemeiited, tliese protocols require traiisactioii logpig, recovery,
conunit, aiid deadlock cietectioidpreventioiifeatures.
Tlle capability to read ai~ciupdate data located at itiultiple sites witlliii a single transaction, preservllig the
properties of atoniicity, isolation, aiid durability [IS], iiiay or lilay iiot be provided by a ciistributed database
~iia~lage~~ieiit
system. How the database lialidles distributed coi~curre~icy co~itrolaild coi~uiiitprotocols (without
incuniig excessive overliead costs wliile propagatiilg coiitrol Il~fonnation),a~icitlie ability of tlie systei~lto
coiitlliue operation despite a coi~ipoiieiitfailure (ciisurlllg that database co~isisteilcyis iiot violated), Jeteriiiuies
tlie extent to wliicli distributed tra~isactioii~liailageilieiitis supported.
curre~itlyi ~ ~ i p l e ~ i ~~iultisite
i e ~ i t update; ~~pciates
are lllliited to eitller tlie local host or a a~lotllersi~iglehost i ~tlie
i
network.
Tile ~liostwell -k~iowna~iclwiciely usecl nietliod for i~nple~nentllig concurrency co~itrolis two-phased locking hi
wliicl~tra~isactio~~s wa~itlligto reacl data obtalli a sliared lock 011 tlie data ite~ll,and tra~lsactio~is
walltirig to write
tlie data iten1 obtalll an exclusive lock. Tlie granularity of tlie lock has beell tlie subject of Jiscussio~iand dispute;
generally, lockuig occurs at tlie record, or ttlple, level, with a few syste~rislocking at the file, or "relation" level.
Database syste~ilsthat lla~idledisk accesses themselves, rather tliali using 110 provideel by the operatirig system,
11iay lock at tlie "page" level.
The deadlock sihlation, UI wliicli two transactioas each have a locked data itern and are waiting for tlie otlier to
release tlie lock, is getierally lia~idledvia deadlock detection mechanisnls.
2 3 4 . 2 Recovery Protocols
The ~iiostwell-known and wiciely used metliod for U~lple~~~eritllig recovely protocol is tlie two-phase conmiit.
Durllig tlie first phase, the participati~igsites indicate the ability ancl willi~lg~iess
to conunit; cluring tlie seco~id
phase, if all participa~itshave a~lswereciaffin~iatively,tlie trallsactio~iis globally conunitted. If even o ~ l e
participant resporlds negatively, or fails to respo~ld,tlie tra~lsactio~iis aborted at evevzry site.
Tlle successful implzme~itationof the two-phased co~iul~it depends 011 a tra~lsactio~~logging fu~lctio~l at each
site during wliicli log records co~~talllllig illfon~iatiolifor undolllg and redoing tra~isactio~ls is writtell to
redmidant, non-volatile storage. The two-pliasecl co~lltiiitis tolera~ltof failures as long as there is 110 loss of log
i~ltbr~nation. Protocols exist that deal wit11 tllose situatio~lswllere sites fail ciuru~gthe ready-conunit
sequenci~ig.One of tlie proble~iisassociated wit11 tlie two-phased coliuliit occurs wlie~ia co~lu~lu~iicatio~is
failure or a failure of tlie site lliitiatlllg a tra~lsactiollOCC~ILSres~iltuigi11 a partitio~led letw work (see sectio~l
2.3.4.3). Sollie sites niay be blocked wliile waitlllg for the co~rurlit/abor-tconu~land.During this ti~iiesystzlli
availability is affected by tlie lield resources related to the blocked transaction. The practice of elllnhlating the
"ready" phase by llavllig sites tra~is~iiit a "ready" i~~lt~iediatelyafter executi~igtlie tra~isactio~iexacerbates tlie
bloclu~igproble~~i w11z11a letw work or initiating site failure occurs.
Variatio~isIi the two-pllased co~lullithave bee11 designed 111 an attelilpt to solve tlie blockllig proble~n.Tile
"presuniect cotiunitlabort" valiatioli assunies a tra~isactio~l is conmlitteci/abor-teci if no llifor~llatio~i
about it is
co~ltai~ied hi tlie log. Tlie "spoollig" variation stores ~liessagesfor a dow~iedsite at a predefhled "spooluig site".
Wlleil tlie site recovers, it applies the spooled aiessages. Aiother variation directs recoveri~~g sites to look for
lost i~lfor~liatio~iat otller sites i11 tlie ~ietwork.
2.3.4.3 Termination Protocols
Sites participatulg UI a distributed ciatabase ~liusthave a co~~siste~it view of tlie aetwork. If, because of a
coriui~u~iicatioris failure, tlie letw work beco~llespartitioned, sites in eacll partition will llave a di!Yerent "view",
suice all tlie sites ui tlie otlier partiti011will appear to be down.
Addressing how sites deal with this type of coiiuiiuiiicatio~ifailure, ter~lii~iatioii protocols lia~idletlie abortion of
executuig tra~isactio~is. Tliese ~ ~ r o t o c owliicli
l ~ , use tlie tiiiieo~t~iieclia~iisiii,
vaiy depeiidi~igoil tile stage of tlie
transaction, tlie kiii~lsof coliux~~~iicatio~l periiiitte~lwitliui tlie DDBMS, aiid wlietlier it is tlie uiitiator of a
tra~isactiolio r a participaiit tliat lias failed.
Considesing tlie uiitiator, if a failure (tinieout) occurs wliile waituig for tlie participaiits to respo~idwith a
conunit/abort decisioii, tlizii tlie trruisactio~icall be globally aborted. If a failure occurs wliile wairuig for a
co~iuxiitor abort achiowledgenient, tlie iiiitiator call oiily coiitlli~~e to wait. For a par-ticipatllig site, if it lias
received an initial update iliessage but uever receives the prepare to co~iuxiitor abort, it call abort tlie
transaction. However, if a participa~xtlias voted to coiiuxiit a transaction, but i~everreceives a co~rl~ilit iiiessage
from tlie initiator, it will be blocked fro111 ally furtlier activity unless tlie systeiii allows it to co~r~lx~uiiicate
with
aiiotlier participati~igsite.
Blockuig aiid non-blocking te~~xii~iatio~i algoritlllxis have beeii developed that deal witli variatioils that iiiay
a ~ i s when
e sites are allowed to "discuss" their tra~isactioiistates.
2.3.4.4 Reliability
Two related aspects of tlie reliability of distributed database systeins are correctliess aiid availability. Tliese two
factors are iiiversely related; imposing lxiore of oiie results in less of tlie other. Tlie trade-off iieeds to be
evaluated by tlie clesig~ierof tlie syste~iiat Iiaiid.
For 1io11-redu~i~iaiit
data, availability Jepeiids strictly 011 tlie occurrelice of site or ~ietworkfailures; tliere is no
way to uicrease tlie reliability of tlie system. hicreasuig tlie availability of tlie systeiii is a major goal wlieii
i~itrocluci~ig
redu~ida~itdata hito a distributed database system.
Despite Date's nile co~iceriiuigsite autoiioixiy, reality lxiay dictate va~ylligdegrees of autoiioniy. Giveii tliat each
site iiiallitailis coiitrol of its owl1 data, tliere lilay be co~iipzlliiigreasoils for tlie existence, at soiiie c e ~ ~ t rsite,
a l for
ally of tlie following:
2. A ceiitral sclieduler, or coorcilliating process, respo~isiblefor synclironiziiig access to tlie global database
3. A central cieadlock detector to wliicli local sites periodically report iidomiation relatllig to trailsactioils
waiting for resources; a si~xipledetectioii meclianis~xi,this inay be a viable clioice if tlie iietwork lias tlie
capacity to carly tlie extra co~iuxiu~iicatioiis load a i d if tlie issue of failures related to tlie tirile it takes to
traiislliit deadlock data to tlie ce~itralsite lias beeii considered
As it1 ally field of eiigineerfiig, a system's arcliitecture defines its structure. Witliui tlie field of co11iputersysrellls
we try to establisli soiiiz refere~icearchitecture tliat we ter~iia "staiiciard". Software developers iiiay deviate
fro111 this reference, and in tlie past they Iiave, but deviatuig iri today's iiiarket is risky busuiess.
Sta~ldardsrely on prove11 and mature tech~ology.The rapici ul~lovatio~i rate ill this field illakes standards
obsolete before they call be establislied. Sulce tlie relatioual data ~rloclela i d so~llevariant of SQL llave beell
adopteel by today's co~illllercialDBMS products, tliese liave bee11"stanciarciized". Especially for l~eterogeneous
distributeci databases, standarcis for both la~lguageand rclliote access are esseiitial. If the two-pl~ase co~iu~lit
and two-pllase lochlg protocols were standardized, irilplerlieiitatio~iswould be straiglitfo~warci.
All of tlie available distributed database products support soil~eversioli of IBM's Stx~ictureciQuery Language
(SQL). Altl~ougl~tlie PLlllerica~lNatiollal Sta~ldardsIiistitute (ANSI) establislied an SQL standard, SQL-86, in
1986, each prociuct's versioll of SQL is different.
The SQL-86 stallclard llas beell vigorously attacked by E.F. Codd hi at least tluee publications. Tile tisst two
occurred UI a two yax-t article, "Fatal Flaws in SQC', appeaxing i l the August allel September, 1988 editio~lsof
Datlzrnatiort. Codd reiteratecl and elaboratecl his co~ilplai~its
in liis recent publication Tlie Relatioilal Model for
Database ManagementlVeision 2.[38] Tllree flaws, described by Codcl as Ilavuig "grave consequences" are
these:
Sulce iricreashlg numbexs of businesses and govenllile~ltinstitutions are beco~llll~g depe~idetlton relatiollal
DBMSs for tlle success of their operations, Codd believes tllese flaws must be repaired. He recolr~lliei~ds that
database usexs avoid duplicate rows witllirl relatio~lsat all times, avoid llested velsio~lsof SQL stateille~lts
wllerlever a IIOII-11estedversio~lis possible, a ~ l dtake extra care wlle~imanipulating relatio~lsthat llave col~l~luls
that 111aycolltaul ~nissulgvalues.
Tlie ANSI X3H2 Database Starlciards Coirullittee is currelitly battling over the ~lewlye111ergillgstaiidard called
SQL-2. Eiilbroiled UI tlie battle, but rlot on the conunittee, are Codd and Date. Lading problelrls to be solved
andlor liegotiated are tlle following:
o Security issues associated wit11 GRANT alicl REVOKE; REVOKE lias beell aclded and GRANT
now pennits circular references
DBMS vellciors have establislled proprietary intesprocess conmiunications protocols, but if l~eterogel~eous
clistributed ~iatabasesare to tlourish staildards lilust be established and followed. III 1985 ail I S 0 workilig group
was fornlzd to work 011 Re~iioteAccess Standarcis.
The stanclardization llecessa~yto interconnect I~eterogeneousIlarclware alld suppoi-t tlie trallsfer of data
betweell tlle~iiis provideci by the Ope11 Systeliis I~~tercolu~ect (OSI) protocol falllily of tlie ISO. Call all tlie
systelli fuactions of a ciistributeti database ~iialiage~iiel~t
syste111be pei-fonned adequately at the applicatiolis
layer'! Tllere llave beell suggestioas ia the researcll conmiunity that tl~isliiay be the c o i ~ e capproach.
t
Tllere is an acute lleed for automated tools to support distributed databases. Tools are required for eacli of tlie
followil1g:
Otlier tlia~ithe pe~fonila~lce tool associated wit11 tlie IIlgres/STAR query optiliiizer (Jiscussecl earlier), no
automated tools are provided wit11 currently available distributed DBMS products. Developers of the
one-of-a-kind syste~iisalso express a desire for auto~i~ateciciesigli and 1i1easure11ie11ttools.
Toward builciiug long-lastillg database applications, alld pla~illillgfor ~pwarcl~iiigratioil,the database desigller
is urged to co~lsicierdesign strategies that Illsdate applicatiolls fsoiii cliallges that would ntllerwise be req~iireci
by future releases of your u ~ ~ d e r l y uDBMS
~ g procluct that increase its distributed functionality. If your DDBMS
cloes llot support multisitz update wit11i11a slllgle transaction, provide custoiii software that lilakzs it ~lppcnrto tlrr
~ipplit*ritiorltllat it is supported. If your DDBMS does not suppost table replication, suppleli~el~t it with custoiii
software tliat copies a re~llotetable to tlie user's site tra~ispare~itly to the applicatio~i.Follow Codd's advice
reprdi~ig avoiding capabilities existing i ~SQL i today tliat may 110t be tliere in tlie future. W l ~ etlie
~ i ti~tieco~iies
tliat tlie ~liisslligfeature is provicied by tlie DDBMS, or standardization eluillliates orie tliat is tliere now, tlie
custo~iisoftware is re~iioveciand all applicatio~lsliiay take advantage of tlie i~icreasedfunctionality witliout
modification.
Distributeci ciatabase tecl~~iology's cliief adva~itageis tlie ability to access ciata faster and cheaper tlia~itlie
alteniative centralized database approacli. I11 orcler to ~liakefull use of this advantage, data 111ustbe able to be
located trausparently tlirougliout tlie system, updates to tlie data ~ l ~ ube s tsy~iclironized,and queries of tlie ciata
~iiustbe optimized to reduce not orily tlie local disk accesses, but also tlie co~il~iiu~iicatio~is costs. Wliere tlie
syste~lii~lvolveslieterogeneous databases, tlie s y s t e ~ i ~ ~ ~
bel uable
s t to cope with various SQL ciialects a ~ i dreliiote
procedure call protocols.
Tlie sihlation witliiu tlie distributed database researcli and develop~lle~it co~iuiiu~iityis currently o ~ i directed
e by
~xiarketpull ratlier than tecluiology pusli. Tlie tecli~iologyis gouig to adva~icebased pruiiarily 011 the ~ieedsof tlie
users, ratlier than 011ally radical breaktlirouglis acco~iiplisliedui tlie researcll labs.
Date's twelve rules spell out tlie requirements for ilxiplementi~lgtrue distributed databases, a ~ i duntil tliose rules
call be satisfied by a ge~ieralpu~-poseciistributed database product, applicatio~iswill not be able to take
advantage of tlie hill fu~ictio~iality
offered by this technology.
Researcliers ui so~iietecluiology domauls that have traciitio~lallystudied and developed in isolatio~inow fuid
tlieir tecli~iologiesoverlappuig. Toclay, they are eitller jolliitig ranks or are being forcecl uito cooperati011in order
to produce solutio~isthat meet tlie growing de~ilaridof user conui~unities.Tlie followllig topics are all tecli~iology
areas that are being unpacted by Jevelop~~ie~its being 11iaJe ia distributecl database teclu~ology.
Distributed elatabase syste~lisrun as user applicatio~is011 top of a host operati~lgsystem. Altliougli tlie topic of
distributecl database operatuig syste~iishas ~ i o bee11
t fully researcl~ed,tliere has bee11 so111eJiscussio~ito the
effect that tlie pcrfonnaace anci f~i~ictionality of DBMSs call be ililprovecl by mod@~~g and e~llia~ici~ig
the
operating syste~iito satisfy tlie additio~ialrequirenients of DBMSs, particularly their tra~isactio~isupport, b u l k
~uanagemmt,slid co~ic~iuency co~itrolrzquirenients.
E~lliaiice~iie~its
and ii~lprovei~ie~lts have bee11 iiiipleiileiltecl in special purpose "database operatiiig systenls"
fo~uidill elatabase nlacllllles, but lot witllili tlie colltext of general pur~?oseoperating systenls, altllough sollie
researcli oyerati~igsysteii~sclesigis now illclude sollie of the required functionality. Areas wl~ereoperatiilg
syste~lichange is coiitenlplated are in the provisio~lof the following:
o Network transparency
o User autlie~iticatio~i
a~lciautliorization coiitrol
Soiiie exa~~iples of autonoii~ousdatabases tliat are federated for tlie purpose of i~lfor~liatioii
retrieval are
dial-up llifor~iiatio~i services such as ConipuSexve (TM) and The Source (TM). Dial-up services frequeiitly
guicle tlie user tlirougll a sequence of queries to arrive at the required iiltbrination.
Tlie techniques cleveloped for distributeel databases will not suffice for iiiultidatabases. Curre~itly,the eiilpliasis
in tliis tecluiology area is focusing on a coiil~iio~i
la~lgliageto be used in ~iiailagiilgitifoniiatioll retrieval. Tlie
~ilajorconuiiercial syste~iisare involved and, tl~ougliI S 0 and SQL Access standards, they will be able to
cooperate 111 processll~g~iiultidatabasequeries.[25]
Object-oriented ( 0 0 ) ciatabase syste~ilsare fast gauillig support froill tlie design conlliiunities. Wliere
relatio~lal databases iiieet the Je~iiailds of busll~ess applicatio~is typified by very large aliioulits of
well-structured i~lfonnatio~i, luiiitecl types allel stnlctures, and tra~isactioilsthat last for sliort le~igtlisof tune,
tlie 00 Jata ~iiodelsupports elltities that are objects with fi~~ictioiial cliaracteristics and supports tlie
requirenieiit for dealing with long-lived transactions. Tlie ~liatlie~iiatical sllllplicity of relatiollal database
camiot support coiliplex data types or prograi~uiilllglanguage co~itrolstructures. The 00 Jata 11ioJe1provides a
~iaturalway to niap real-world objects and tlieir relatio~isliipsdirectly to colnputer representations, ~ileetlligtlie
ciata ~liocieli~igrequireiileiits of applicatio~ls sucli as colliputer aided design (CAD), co~liputer aided
manufachirulg (CAM), coiiiputer aided software e~igiiieeri~ig (CASE), hyperineciia and expert syste111s.
M.P. Papazoglou aiid L. Maritlos[4] refute the position that the relatio~ialitiodel is the illode1 best suited for
supportll~g distributed database applications. Co~lcelitratuig on distributed lieterogeiieous i~lfor~iiatio~i
systeilis, they poiilt out that tlie relatioilal iiiodel cioes not adequately support tlie coi~~plex structures required
aiid has linlited seiiiaiitic expressioil capabilities. Tlie data elode1 niust "facilitate the co~~i~~iuilicatioiibetwee11
tlie users of diverse a11J iilcol~ipatibleiilforiiiatioii systeilis a11J assist ... wit11 tlie u~iifor~ii
represe~itatioilaiid
iiitegratioil of lleterogeneous ciata from oiie site to another." [4] Defined ia teriiis of an object-orienteci ciata
ii~odelwllicll encapsulates tlie behavioral properties of tlie database objects, a distlibuted object-orienteel
database management systeiii (doodms) 111apsthe d i t ~rirlri
l t l ~prwt1ssii~$
r colnpolltlrlts of the elltire syste111illto a
ullique systeli~wide object space.
The cloocln~s,as ellvisaged by Papazoglou anci Maritlos, consists of a layereel u~ilbrellaover each autonomous
DBMS. Tlie ulilbrella is col~lprisedof (I) a systelii language component, providiilg a system-wide query
language 111additioll to each site's ow11 query language, (2) nletadata data nlodules, lllappitlg local coilceptual
sclleillas illto the distributed coliceptual schema, and (3) tlie global trallsactioll i~ioclulewllicll provides for
distributed query decoiilpositioil aiid execution, concurrency control, ailel recovery.
Tllere are no ililple~lieiltatiolls,coii~lilercialor research, of a cioodn~s,but its flexible, lllodular approacll aild its
coilfolli~ailceto ~~loderilsoftware engineering priiiciples llldicate that it will be forthcomllig.
Knowledge bases are relational elatabases extended wit11 logic - the capabdity of deducing iiew lliforrllatio~l
fro111 existing ll~forii~ation.
Most of the tecllt~ologyrequired to uiipleli~elitdistributeel databases call be usecl to
iliiplelileilt distributed h~owledgebases; the co~lsiste~icy of the h~owledgebase and its query processll~g
capabilities (especially recursive querying) being two of the tile 111ajorissues.
Current trellds toward the developl~le~~t of hiowledge bases, re~ilovu~g the "llltelligence" of artificial
intelligence a i d expert systeli~sfrolll applicatioils aild placing it wllere it call be shareel by 11la11yapplications, is
spurrllig database rzsearcllers to expand and extend their efforts in distributed database tecl~l~ology to 111zetthe
growlllg iieecis of lu~owledgebases.
3. SUMMARY
Tlle data processu~grequirenieats of today's decentralized co~-~,oratioiis togetller with advances UI both
database a~icirletwork tecliliologies has led to tlie enlergence of distributecl database teclinologies. Altl~ougllno
sta~ldardsyet exist withi11this llew teclinology, soine guideliries have beell provideel by C.J. Date aiici E.E Codd,
both developers of relatio~ialclatabase technologies.
The fuiidaiiiental issue of distributed databases is transparency, whicll, in a distributed database, refers to both
the ciata a ~ i tlie
J iietwork. However, acllievillg tra~~spareiicy
iilust not iiifrilige 011 tlie autonoi~iousnature of each
participatuig site.
Although several data moclels liave been proposecl for use with distributed databases, oilly tlie relatiollal l~lodel
has bee11 ililplel~ierltedin curre~ltcoiilt~lercialproducts. The relatiolial iliociel colltaiiis su~ipledata structures,
provides a solid foul~datioilfor data consisteacy, and allows set-orienteci nianipulatioi~sof relations.
Distributed databases call be either Iiomogeneous, wl~ereall participating local databases are based 011 tlie saine
clata li~odel(and are froill the saliie vendor), or Ileterogeneous, llivolvuig databases beloi~guigto several veiidors
probably not based on the saliie data model. Holi~oge~~eous distributed databases geiierally use a slligle, global
schema, wliile lleteroge~ieousdistributed databases may opt for eitlier a single, ilitegrateci sclie~iiaor a
federatioil of tlie local sclietii:~~.
To obtalli tlie lligll ciegree of reliability aiid availability offired by distributed database technology, the relatioiial
tables iiiust be fragmeiited ancl/or replicated across ll~ultiplesites. Fragiientation iilvolves pal-titioniiig a
relatio~leither llorizoiitally or vertically allel allocatiug tlie partitioned relatiolls to sites wllere tliz data is iiiost
oftell required. Tliis practice is illost useful hl situations wllere fragiiie~itsco~ltaiiiJata with col~uiioi~
geograpliical properties. Replicatioli is a trade-off exercise betweell update costs arlcl tlie beiie.tits of llicreased
reliability slid pe~foriiiaiice.Siice inally factors coiltribute to ail optimal frag~iientatioiilre~~licatioii design, no
tools or algoritlillis liave beell developed to assist tlie ciesigl~erhi this task.
Distributed query capability call be foulid h~just about every distributed database product, with significant
differelices occurriiig 111 the tlie performance capabilities of tlie quely nianager. Distributed query processuig
illust deal with tlie analysis, optinlization, allcl execution of queries referellcilig distributed data. The dylialiiic
nature of a DDBMS uicreases tlie coinplexity of optimizllig, sllice each site iiiust also carry on its owl1 local
execution load, wliile tlie lietwork is subjected to varyltig traffic patter~isand bottleiiecks. Wliile past researcli
lias led to tlie cievelopl~ie~~t
of query optitiiizers for ce~ltralizeddatabases, tliese optimizers are designed with the
goal of minllnizhig respoiise tiiiie. Now they are bei~igextenJerl for distributed databases, with tlie objective of
optllilizi~igboth respoiise ti~lieaiid conuiiunication cost.
Managing trailsactiolls in a distributed clatabase el~viso~iliieiit requires ciealing with co~icurrei~cy coiltrol, systeil~
reliability, and tlie efiicieiicy of tlie syste111as a whole. Today's colruilercial DDBMS products use extelisiolis of
oiie or both of the same tecluiiques used hi centralized databases - locks and tiniestamps. Differences ui DBMS
products can be foulid ui tlie both tlie gmnularity of tlie locks aiid in tlie particular llliple~~~elitatio~i of the
popular two-phase coiii~liitprotocol. Most currelit llilple~iieiitatio~ls are u~isatisfactoryill sifliatiolis u~volvll~g
large numbers of sites and liigli tra~isactiolivoluines, and ~iiostrestrict distributed update to a single site witliui a
single transaction.
Despite tlie requireiiieiits for trailsparelicy a~icisite autoiioii~y,tlie lack of liversal sally accepted staiidards anti
ciiffireilces ~ I the
I i~iipleiiieiitatiorlof tlie data Jictio~iaiy111various versioils of distributed database systeiils llave
produced significant variatioiis in tlie degree to wl~iclitlie requirements liave bee11 inet. 111 order to select tlie
riglit DDBMS or to develop an optlliiuni distributed design, tlie database systeiii Jesigiler ~liustmlderstaiid tlie
relative ilierits of eacli feature and be able to ~naketrade-offs to tofictively lliatcli iilipleilie~itedfeahlres to tlie
specific data ileeds to be supported.
Tlle developi~ientof ciistiibuted database tecliliology is sti~liulatiiigtlie Jevelop~~ieiitof iiew applicatioiis that
require suppoit for distributed data. AdvaiiceJ office autoiliation systenis, coillputer aided ciesigi systenis, aiid
h~owledgebased syste~iisare tllree tliat profit froi~itlie abilily to share data across a iietwork of coniputers.
BIBLIOGRAPHY
1. " A Berlclllliark for Per-forrnance Evaluatioll of a Distributed File System", Anla Hac, The Journal of
Systems and Software, Vol. 9 No.4, May 1989, pp. 273-285.
2. " A Hol~~ogel~eous
Relatioilal Model arld Query Languages for Tzrllporal Databases", Sllaslli K. Gadia,
ACM 'Ikansactions on Database Systems, Vol. 13 No. 4, Decelllber 1988, pp. 418-448.
3. " A Parallel Pipelfiled Relatioilal Quely Processor", Wori Kun, Daniel Gajski and David J. Kuck, ACM
TLansactions on Database Systems, Vol. 9 No. 2, Jurle 1984, pp. 214-242.
4. "Arl Object-Oriented Approacll to Distributed Data Management", M.P. Papazoglou and L. Marulos,
The Journal of Systems and Software, Vol. 11 No. 2, February 1990, pp. 95- 109.
5. "Application Desigl for Distributed DB2", Rob Goldring, Database Programming & Design, Vo1. 33
No. 9, Septzlllber 1990, pp 31 -36.
6. "Arcliitzch~reof Distributed Data Base Systenls", SuJlla Ram a ~ l dClark L. Chastain, The Journal of
Systems and Software, Vol. 10 No. 2, September 1989, pp. 77-95.
7. "Cracks UI the ANSI Wall", Joe Celko, Database Programming & Design, Vol. 2 No. 6, Julie 1989, pp.
66-69.
8. Data Management on Distributed Databases, Be~ijallliilW. Wah, UMI Researcli Press, A m Arbor,
Micliigan, 1981.
9. Database System Concepts, Heilry E Kortli and Abrallarll Silberschatz, McGraw-Hill Book Coinpauy,
New York, N.Y.,1986.
11. "DB2 v. 2.2: A Few More Bells & Wllistles", Craig S. Mullins, Database Programming & Design, Vol. 3
NO. 6, June 1990, pp. 59-61.
12. Distributed Databases, Edited by I.W. Draffan and E Poole, Cambr-iclge University Press, London,
England, 1980.
13. "Distlibuted Databases", Herb Edelstein, DBMS, Vol. 3 No. 10, Seyte~liber1990, pp. 36-48.
14. "Distributed Database for SAA", R. Rzinsh, IBM Systems Journal, Vnl. 27, No. 3, 1988, pp. 362-369.
Distributed Database Management Systems, Oliil H. Bray, D.C. Heath aiid Compaily, Lexington,
Massacllusetts, 1982.
Distributed Databases Principles and Systems, Stefailo Ceri aird Giuseppe Pelagatti, McGraw Hill
Book Conipany, New York, N.Y. 1984.
"Divide a~lclConquer Your Database", Deiulis Livingston, Systems Integration, Vol. 24 No. 5, May 1991,
pp. 43 -45.
"Does Clieilt-Selver Equal Distributed Database?", Beth Gold-Benlstziu, Database Programming &
Design, Septe~xlber1990, pp.52-62
"Dynamic File Migration 111 Distributed Computer Systems", Bezalel Gavisll and Olivia R. Liu Sheng,
Communications of the ACM, Vol33. No.2, Februaly 1990, pp. 177- 189.
"Enerald Bay's Quiet Returii", La11 Bariles, DBMS, Vol. 3 No. 11, October 1990, pp. 50-57.
"Heterogenous Processing: A 4GL Case Study", Micliael M. David, Database Programming & Design,
Vol. 4 No. 3, Marc11 1991, pp.27-34.
Intelligent Databases, Kalxva~lParsaye, Mark Cliiglell, Setrag Klioshafian, a i d Harry Woilg, Jolul
Wiley & Sons, Inc, New York, N.Y. 1989.
"Maintauling Availability i11 Partitioned Replicated Databases", Aix El Abbadi ailcl Sail1 Toueg, ACM
Tkansactions on Database Systems, Vol. 14 No. 2, Julie 1989, pp. 265-290.
"Multiple-Quay Optunization", Tllilos K. Sellis, ACM Ikansactions on Database Systems, Vol. 13 No.
I, Marc11 1988, pp. 23-52.
"On tlie Foundations of tlie Universal Relatioil Model", David Maier, Jeffrey D. Ull~liailand Mosllc Y.
Vardi, ACM Ikansactions on Database Systems, Vol. 9 No. 2, Julie 1984, pp. 283-308.
30. of Long-Living Transactions", I<. Brahnadathan and K. V S. Ramarao, The
" 0 1 1 the Maliagel~~e~lt
Journal of Systems and Software, Vol. 11 No. I, Jailuaiy 1990, pp. 45-52.
Principles of Distributed Database Systems, M. Ta~nerOzsu and Patrick Valduiiez, Prelitice Hall, Inc,
Ellglewood Cliffs, New Jersey 1991.
Relational Database Technology, Suacl Alagic, Springer-Verlag, New York, N.Y., 1986.
"SQL2 AII Eniergulg Standard", Ji111 Melton, Database Programming & Design, Vol. 3 No. 11,
Novel~~ber
1990, pp. 25- 32.
"Strategic Database Planning", G. Lawrellce Sanders, Database Programming & Design, Vol. 3 No. 11,
Novelliber 1990, pp. 52- 56.
"Tile Case for Object-Orienteci Databases", Tllol~lasAtwoord, IEEE Spectrum, February 1991, pp.
44 -47.
"The Raid Distributed Database Syste~n",Bllarat Bl~argavaand Jollll Riedl, IEEE lkansactions on
Software Engineering, Vol. 15 No. 6, June 1989, pp. 726-736.
The Relational Model for Database ManagementNersion 2, E.E Cocld, Acidison-Wesley Publisl~blg
Compally, Reading, Massacllusetts, 1990.
"Tile Trouble wit11Two-Phase Conlmit", George D. Tillniann, Database Programming & Design, Vol. 3
NO. 9, Septeliiber 1990, pp. 64-70.
Tutorial: Distributed Database Management, Philip A Berlntein, Jalues B. Rothoie, David W. Shipnlan,
IEEE Publisllulg Services, New York, N.Y., 1978.
"Update a11d Retrieval 111 a Relatio~~al Database Tlirougl~a Universal Sclielila Interface", Volkert
Brosda ailcl Gottfr-ied Vossen, ACM lkansactions on Database Systems, Vol. 13 No. 4, Deceli~ber1988,
pp. 449 - 485.
"Wily Clloose Distributed Database?", Micliael Krasowski, Database Programming & Design, Vol. 4
No. 3, Marc11 1991, pp. 46-53.
APPENDIX A: GLOSSARY
bridge Network llarciware that serves to restrict packets to a local seguent of a network.
broadcast network A letw work i11 wliicli all sites receive all the ~liessagessent by a~lotliersite; a r~lecllariislli
(typically a prefix colltai~ii~ig
an identitication of tlle clestitiatioli site) allows each site to recogiize tliose
~lizssagesdirected to it.
data distribution Refers to the partitio~ling,frageientatioe, reylicatioa, and allocatio~lof data a~iio~lg
the sites
participating in a clistributed database.
data manipulation language A language that enables users to access or tuanipulate ciata orga~lizedby a data
nlodel. Aprmt~durnlIn~~guclgtp requires the user to speclfjr what data is lleeded and how to get it; a ~lo~y)rmtdunrl
language requires tlie user to spec@ what data is l~eedeclwitllout spec@ng how to get it.
data mapping Data types or data values are co~lvertedfor corlfonliity wit11 each otller.
data model A collectio~lof cotlceptual tools for describing data, data relationships, data semantics, and
collsiste~icyrestraints.
deadlock avoidance Metllods that e~~iploy either co~icurrellcyco~ltrolteclmiquzs that never result in deadlocks
or require schedulers to detect pote~ltialcieadlock situatio~lsin adva~lcea~iclensure that they will not occur.
deadlock detection Tlie Jetectioll of a state of deadlock and the preelllptio~land abortio~lof otle (or more)
transaction(s) until processlllg iliay continue.
deadlock prevention Metllod that guaratltees deadlock calulot occur; all data ite~llsrequired for a transactio~l
are predeclared and lriust be accessible before tllz tra~isactiollis initiated.
distributed database A database syste~iithat provides access to data located at nlultiple sites hi a network.
distributed processing Based on a collectio~iof progralxis that are Jistlibuted a111o11gsites in a network, per~uitsa
progralli at any site to uivoke a ~ ~ s o g r aat~allotller
li site in the lletwork as if it were a locally resident subprogram.
distributed query processor A distlibuted database syste~lll~iodulzthat, give11a query, deter~rlulesan executio~l
strategy that llli~li~llizes wllicll ll~cludes110, CPU, and comnlunicatio~icosts.
a syste~licost fil~ictio~i
durability 011ce a trallsactio~lis comnlitteci, tlie results of its operatio~lswill lever be lost, I~cieyendentof
subsequent fa ilures.
Ethernet Ai exaiilple of a packet-switclied iietwork 111 wliich packets liiay vary iii size fro11164 bytes to 1,518
bytes aiid operates at 10 lieg gab its per second.
fragment groups Collectioiis tliat ll~cludethe prililaly Caginent alld tliose Cagiiieilts resultllig from deriveJ
frag~iie~it
a t'ion.
fragmentation Dividiiig a global relatioil llito subrelatioi~saiid allocatllig the subrelations to sites participatulg
in tlie global ciatabase.
fragmentation, horizontal Data iteins in different local databases iliay be ide~ltitiedas logically belo~iguigto tlie
sanie table in tlie global database; a relatio~lpartitioiled aloilg its rows.
fragmentation, vertical Data iteiiis 111 differelit local databases iilay be ideiitified as logically represelitiilg tlie
saiile row in the global database but coiltaiiiiiig different att~ibutesfor the row; a relation paltitioned illto
siiialler relatioas.
gateway Network software tliat perillits tlie ilioveiiieiit of lllforiiiatioii betweell lietworks of diffeiiig
conunwiicatio~~s
protocols.
homogeneous database Refers to a distributed database in wliich eacli pliysical coilipol~eiiti'uiis uiider eitlier tlie
saille database illaiiageilleiit systeiii or, at least, tlie sailie data i~iodel.
heterogeneous database R e k ~ to s a distributed ciatabase 111wllicli iiot all pliysical coilipo~~eiitsrull uiicier the
sailie database liiaiiageiiiei~tsystem; soille literature refers to a distributed database as bellig hrtrrogrlzrous if
the local iiodes llave diffireiit types of coiiiputers aiid operating systeiils, even if all local databases are based on
tlie sailie data ~ilocielaiid even tlie saiile database iliaiiageiiie~itsysteiil.
isolation hi i~icoilipletetrailsactioil caiiliot reveal its res~iltsto otlier traiisactioi~sbefore its conuuitment.
local autonomy Refers to tlie aiilouilt of coiitrol exercised by local database adiiilllistrators witl~iila distributed
database eiiviroiiiilent; local adiiihlistmtors wit11 total coiltrol over that part of a distributeci database at their
sites are said to be autonoiiious.
local recovery manager Module of a distributed database iliailageiileiit systeiii (one exists at a local site)
respoiisible for ll~~pleiliei~tll~g
local procedures by wllicli tlie local database can be recovered to a coiisiste~~t
state
followuig a fa il ure.
long-lived transaction Data tliat lives after the processes tliat created tlieiil tenni~iate.
object-based logical model A data illode1 used ui describiiig data at the coiiceptual and view levels. The
coi~cephiallevel ciescribes wllat data is stored in the database aiid wllat relatiollsllips exist aillo~lgtlie data aiiJ
the view level restricts tlie conceptual level to part of the database. Object-based logical ~ilodelsallow tlie
ex-plicit speciticatioii of data coiistrallits.
packet-switched network A iletwork ui wliich iiicssages are broke11 up into packets and each packet is
transnlittzci iiidiviclually. Tlie packets liiay travel uidepeiideiltly alici inay take ciiffireiit routes.
replication Data ite~rlsin different local ciatabases rilay be iderltifiecl as copies of each other.
router Network l~arciwarethat picks the optinla1 route to se~lcitraffic over a network.
schema Describes the database as it is stored; describes the pllysical format, storage locations, and access paths
alici defi~lestlie logical structure.
schema, loc!l Describes the data at tlie local sites UI a distributed ciatabase
serializability If several trallsactiolls are executeci concurrently, tlie result tnust be the saliie as if they were
executed serially hi sollie order.
timestamp U~liquelyiclentifies a transaction; for two trallsactio~lsA and B, it' A occurred before B tlle~ithe
tllliesta~rlpof A is less tllall tlie tunestanip of B.
transaction A sequence of operations wllicli either are performed hl e~itiretyor are not perfor~iledat all; an
atolilic unit of executio~l
transparency Also called daft1 ir~ciryrrrclrrzcr,refers to tlle independence of application programs fro111 the
pllysical or logical orga~iizatio~i
of tlie data.
transparency, distribution Refers to the u~dependenceof applicatio~lprogsams fso111tlie pllysical locatio~iof tlie
ciata in a distributed database.
transparency, replication Refers to the lack of awareness by application prograliis of the existence of replicated
data in a distributed database.
two-phase commit A protocol that requires for each trallsactio~~ a first phase dusi~igwllicll an abortlconuiiit
Jzcisio~lis ~lladeby each pal-ticipa~ltand a second pliase duri~lgwllicll the Jecisio~lis implemented.
two-phase locking A lockillg protocol tliat requires tbr each transaction a tirst phase duri~igwliicll new locks are
acquired and a seco~ldphase ciuru~gwllicll locks are o111yreleased.
AF'PENDM B: VENDORS IN DISTRIBUTED DATABASE TECHNOLOGY
ASK Colliputer Systeliis Inc., 111gres Products Division, 1080 Marina Village Parkway, Alameda,
Califoniia 94501, (415) 769- 1400
Gupta Tecli~~ologies
Inc., 1040 Marsli Road, Meiilo Park, Califor~lia94025, (415) 321 -9500
IBM Cor~>oration,
Ole1 Orcl~arclRoad, Aniionk, New York 10504, (914) 765-1900
Oracle Corporation, 500 Oracle Parkway, Redwood Sliores, Califo~liia94065, (415) 506-7000
PeerLogic Inc., 333 DeHaro Street, Sa11Francisco, Califorilia, 94107, (415) 626-4545
Progress Software Corporation, 5 Oak Park, Bedford, Massacliusetts, 01730, (617) 275-4500
Ratliff Software Productio~iInc., 2155 Verclugo Blvd., Suite 20, Montrose, Califor~iia91020,
(818) 546-3850
Revelatioli Technologies, 2 Park Avenue, New York, New York, 10016, (617) 275-4500
Saros Corporation, 10900 N.E. 8th Street, Bellevue, Wasliiligto~i98004, (206) 646-1066
Sybase hic., 6475 Cllristie Avenue, Emeryville, Califonlia 94608, (415) 596-3500
Ta~ide~li
Colliputers Inc., 19333 Valico Parkway, Cupertino, Califorliia 95014, (408) 965-7542
WordTecli Systeiiis Itic., 21 Altorllida Roaci, Orinda, Califoniia 94563, (415) 254-0900
XDB Systems, 14700 Sweitzer L,ane, Lmrel, Maryland 20707, (301) 317-6800