distributed-systems-pranay
distributed-systems-pranay
DISTRIBUTEDSYSTEMS
B.TECH./CSE&IT/R18
SYLLABUS
UNIT–I
CharacterizationofDistributedSystems:Introduction,ExamplesofDistributedsystem
s,Resourcesharingandweb,challenges.
Systemmodels:Introduction,ArchitecturalandFundamentalmodels,NetworkingandInt
ernetworking,InterprocessCommunication.
DistributedobjectsandRemoteInvocation:Introduction,Communicationbetweendist
ributedobjects,RPC,Eventsandnotifications,Casestudy-JavaRMI.
UNIT–II
OperatingSystemSupport:Introduction,OSlayer,Protection,ProcessesandThreads,C
ommunicationandInvocation,Operatingsystemarchitecture,DistributedFileSystems-
Introduction,FileServicearchitecture.
UNIT–III
PeertoPeerSystems:Introduction,Napsteranditslegacy,PeertoPeermiddleware,Routin
goverlays,Overlaycasestudies-Pastry,Tapestry,Applicationcasestudies-
Squirrel,OceanStore.
TimeandGlobalStates:Introduction,Clocks,eventsandProcessstates,Synchronizingp
hysicalclocks,logicaltimeandlogicalclocks,globalstates,distributeddebugging.
CoordinationandAgreement:Introduction,Distributedmutualexclusion,Elections,M
ulticastcommunication,consensusandrelatedproblems.
Page1of 88
UNIT–IV
TransactionsandConcurrencyControl:Introduction,Transactions,NestedTransactio
ns,Locks,Optimisticconcurrencycontrol,Timestampordering.DistributedTransactio
ns:Introduction,FlatandNestedDistributedTransactions,Atomiccommitprotocols,Con
currencycontrolindistributedtransactions,Distributeddeadlocks,Transactionrecovery.
UNIT–V
Replication:Introduction,Systemmodelandgroupcommunication,Faulttolerantservice
s,Transactionswithreplicateddata.
Distributedsharedmemory,DesignandImplementationissues,Consistencymodels.
*****
Page2of88
UNIT-I
CHARACTERIZATION OF DISTRIBUTEDSYSTEMS
INTRODUCTION
DefinitionofDistributedSystem:Adistributedsystemisoneinwhichcomponentslocate
datnetworkedcomputerscommunicateandcoordinatetheiractionsonlybypassingmessag
es.
Introduction:
Wedefineadistributedsystemasoneinwhichhardwareorsoftwarecomponentslocat
edatnetworkedcomputerscommunicateandcoordinatetheiractionsonlybypassin
gmessages.
Thissimpledefinitioncoverstheentirerangeofsystemsinwhichnetworkedcomputer
scanusefullybedeployed.
Computersthatareconnectedbyanetworkmaybespatiallyseparatedbyanydistance.
Theymaybeonseparatecontinents,inthesamebuildingorinthesameroom.
Ourdefinitionofdistributedsystemshasthefollowingsignificantconsequences:
1) Concurrency:Inanetworkofcomputers,concurrentprogramexecutionisthenorm
.Icandomyworkonmycomputerwhileyoudoyourworkonyours,sharingresources
suchaswebpagesorfileswhennecessary.Thecapacityofthesystemtohandleshared
resourcescanbeincreasedbyaddingmoreresources(forexample.computers)tothe
network.
2) Noglobalclock:Whenprogramsneedtocooperatetheycoordinatetheiractionsbye
xchangingmessages.Closecoordinationoftendependsonasharedideaofthetimeat
whichtheprogramsactionsoccur.Butitturnsoutthattherearelimitstotheaccuracyw
ithwhichthecomputersinanetworkcansynchronizetheirclocks–
thereisnosingleglobalnotionofthecorrecttime.
3) Independentfailures:Allcomputersystemscanfail,anditistheresponsibilityofsy
stemdesignerstoplanfortheconsequencesof
Page3of88
possiblefailures.Distributedsystemscanfailinnewways.Faultsinthenetworkresul
tintheisolationofthecomputersthatareconnectedtoit,butthatdoesn’tmeanthatthey
stoprunning.Infact,theprogramsonthemmaynotbeabletodetectwhetherthenetwo
rkhasfailedorhasbecomeunusuallyslow.Similarly,thefailureofacomputer,ortheu
nexpectedterminationofaprogramsomewhereinthesystem(acrash),isnotimmedi
atelymadeknowntotheothercomponentswithwhichitcommunicates.Eachcompo
nentofthesystemcanfailindependently,leavingtheothersstillrunning.
EXAMPLESOFDISTRIBUTEDSYSTEMS
Ourexamplesarebasedonfamiliarandwidelyusedcomputernetworks:
1) TheInternet,
2) Intranetsand
3) Theemergingtechnologyofnetworksbasedonmobiledevices.
1) TheInternet:
TheInternetisavastinterconnectedcollectionofcomputernetworksofmanydifferent
types.
Programsrunningonthecomputersconnectedtoitinteractbypassingmessages,empl
oyingacommonmeansofcommunication.
TheInternetisalsoaverylargedistributedsystem.Itenablesusers,wherevertheyare,t
omakeuseofservicessuchastheWorldWideWeb,emailandfiletransfer.
InternetServiceProviders(ISPs)arecompaniesthatprovidemodemlinksandotherty
pesofconnectiontoindividualusersandsmallorganizations,enablingthemtoaccess
servicesanywhereintheinternetaswellasprovidinglocalservicessuchasemailand
webhosting.
MultimediaservicesareavailableintheInternet,enablinguserstoaccessaudioandvid
eodataincludingmusic,radioandTVchannelsandtoholdphoneandvideoconferenc
es.
Page4of88
2) Intranets:
AnintranetisaportionoftheInternetthatisseparatelyadministeredandhasaboundary
thatcanbeconfiguredtoenforcelocalsecuritypolicies.
Itiscomposedofseverallocalareanetworks(LANs)linkedbybackboneconnections.
Thenetworkconfigurationofaparticularintranetistheresponsibilityoftheorganizat
ionthatadministersitandmayvarywidely-
rangingfromaLANonasinglesitetoaconnectedsetofLANsbelongingtobrancheso
facompanyorotherorganizationindifferentcountries.
AnintranetisconnectedtotheInternetviaarouter,whichallowstheusersinsidetheintr
anettomakeuseofserviceselsewheresuchastheWeboremail.Italsoallowstheusers
inotherintranetstoaccesstheservicesitprovides.
Manyorganizationsneedtoprotecttheirownservicesfromunauthorizedusebypossib
lymalicioususerselsewhere.
Forexample.acompanywillnotwantsecureinformationtobeaccessibletousersinco
mpetingorganizations,andahospitalwillnotwantsensitivepatientdatatobereveale
d.
Companiesalsowanttoprotectthemselvesfromharmfulprogramssuchasvirusesent
eringandattackingthecomputersintheintranetandpossiblydestroyingvaluabledat
a.
Theroleofafirewallistoprotectanintranetbypreventingunauthorizedmessagesleavi
ngorentering.
Afirewallisimplementedbyfilteringincomingandoutgoingmessages,forexamplea
ccordingtotheirsourceordestination.
3) Mobileandubiquitouscomputing:
Technologicaladvancesindeviceminiaturizationandwirelessnetworkinghaveledin
creasinglytotheintegrationofsmallandportablecomputingdevicesintodistributed
systems.
Thesedevicesinclude:
i. Laptopcomputers.
Page5of88
ii. Handhelddevices,includingpersonaldigitalassistants(PDAs),mobilepho
nes,pagers,videocamerasanddigitalcameras.
iii. Wearabledevices,suchassmartwatcheswithfunctionalitysimilartoaP
DA.
iv. Devicesembeddedinappliancessuchaswashingmachines,hi-
fisystems,carsandrefrigerators.
Theportabilityofmanyofthesedevices,togetherwiththeirabilitytoconnectconvenientlyto
networksindifferentplaces.makesmobilecomputingpossible.
RESOURCESHARING
Weroutinelysharehardwareresourcessuchasprinters,dataresourcessuchasfiles,an
dresourceswithmorespecificfunctionalitysuchassearchengines.
Lookedatfromthepointofviewofhardwareprovision,weshareequipmentsuchaspri
ntersanddiskstoreducecosts.
Butoffargreatersignificancetousersisthesharingofthehigher-
levelresourcesthatplayapartintheirapplicationsandintheireverydayworkandsoci
alactivities.
Forexample,usersareconcernedwithsharingdataintheformofashareddatabaseoras
etofwebpages–notthedisksandprocessorsonwhichtheyareimplemented.
Inpractice,patternsofresourcesharingvarywidelyintheirscopeandinhowcloselyuse
rsworktogether.
Atoneextreme,asearchengineontheWebprovidesafacilitytousersthroughoutthew
orld,userswhoneednevercomeintocontactwithoneanotherdirectly.
Weusethetermserviceforadistinctpartofacomputersystemthatmanagesacollection
ofrelatedresourcesandpresentstheirfunctionalitytousersandapplications.
Forexample,weaccesssharedfilesthroughafileservice;wesenddocumentstoprinter
sthroughaprintingservice;webuygoodsthroughanelectronicpaymentservice.
Page6of88
Theonlyaccesswehavetotheserviceisviathesetofoperationsthatitexports.Forexam
ple,afileserviceprovidesread,writeanddeleteoperationsonfiles.
Resourcesinadistributedsystemarephysicallyencapsulatedwithincomputersandca
nonlybeaccessedfromothercomputersbymeansofcommunication.
Foreffectivesharing,eachresourcemustbemanagedbyaprogramthatoffersacommu
nicationinterfaceenablingtheresourcetobeaccessedandupdatedreliablyandconsi
stently.
Thetermserverisprobablyfamiliartomostreaders.Itreferstoarunningprogram(apro
cess)onanetworkedcomputerthatacceptsrequestsfromprogramsrunningonotherc
omputerstoperformaserviceandrespondsappropriately.
Therequestingprocessesarereferredtoasclients,andtheoverallapproachisknownas
client-servercomputing.
Inthisapproach,requestsaresentinmessagesfromclientstoaserverandrepliesaresen
tinmessagesfromtheservertotheclients.
Whentheclientsendsarequestforanoperationtobecarriedout,wesaythattheclientinv
okesanoperationupontheserver.
Acompleteinteractionbetweenaclientandaserver,fromthepointwhentheclientsend
sitsrequesttowhenitreceivestheserver’s response,iscalledaremoteinvocation.
Clientsareactive(makingrequests)andserversarepassive(onlywakingupwhenthey
receiverequests);serversruncontinuously,whereasclientslastonlyaslongastheap
plicationsofwhichtheyformapart.
Ifadistributedsystemiswritteninanobject-
orientedlanguage,resourcesmaybeencapsulatedasobjectsandaccessedbycliento
bjects,inwhichcasewespeakofaclientobjectinvokingamethoduponaserverobject.
Page7of88
WEB
TheWorldWideWebisanevolvingsystemforpublishingandaccessingresourcesand
servicesacrosstheInternet.
TheWebisbasedonthreemainstandardtechnologicalcomponents:
1) TheHyperTextMarkupLanguage(HTML)isalanguageforspecifyingtheco
ntentsandlayoutofpagesastheyaredisplayedbywebbrowsers.
2) UniformResourceLocators(URLs),whichidentifydocumentsandotherres
ourcesstoredaspartoftheWeb.
3) Aclient-
serversystemarchitecture,withstandardrulesforinteraction(theHyperText
TransferProtocol-
HTTP)bywhichbrowsersandotherclientsfetchdocumentsandotherresourc
esfromwebservers.
1) HyperTextMarkupLanguage(HTML):
TheHyperTextMarkupLanguageisusedtospecifythetextandimagesthatmakeupthe
contentsofawebpage,andtospecifyhowtheyarelaidoutandformattedforpresentati
ontotheuser.
Awebpagecontainssuchstructureditemsasheadings,paragraphs,tablesandimages.
HTMLisalsousedtospecifylinksandwhichresourcesareassociatedwiththem.
Example:
<html>
<head>
<title>MyWebPage</title>
</head>
<h1align="center">MyFavouriteWebsites</h1>
<bodybgcolor="yellow">
<center><imgsrc="picture.jpg"width=200height=200/>
Page8of88
</center>
<ahref="http://www.google.co.in/">Google</a><br/>
<ahref="http://www.yahoomail.com/">Yahoomail</a>
</html>
ThisHTMLtextisstoredinafilethatawebservercanaccess-letussaythefilesample.html
2) UniformResourceLocators(URLs):
ThepurposeofaUniformResourceLocatoristoidentifyaresource.
BrowsersexamineURLsinordertoaccessthecorrespondingresources.
SometimestheusertypesaURLintothebrowser.
Morecommonly,thebrowserlooksupthecorrespondingURLwhentheuserclicksona
linkorselectsoneoftheir'bookmarks';orwhenthebrowserfetchesaresourceembed
dedinawebpage,suchanimage.
HTTPURLsarethemostwidelyused,foraccessingresourcesusingthestandardHTT
Pprotocol.
AnHTTPURLhastwomainjobstodo:
i. Toidentifywhichwebservermaintainstheresource,and
ii. Toidentifywhichoftheresourcesatthatserverisrequired.
3) HyperTextTransferProtocol(HTTP):TheHyperTextTransferProtocoldefinesthe
waysinwhichbrowsersandothertypesofclientinteractwithwebservers.
Features:
i. Request-replyinteractions:HITPisa'request-
reply'protocol.TheclientsendsarequestmessagetotheservercontainingtheURLof
therequiredresource.Theserverlooksupthepathnameand,ifitexists,sendsbackthe
file'scontentsinareplymessagetotheclient.Otherwise,itsendsbackanerrorrespons
esuchasthefamiliar'404NotFound'.
ii. Contenttypes:Browsersarenotnecessarilycapableofhandlingeverytypeofconte
nt.Whenabrowsermakesarequest,itincludesalistof
Page9of88
thetypesofcontentitprefers–
forexample,inprincipleitmaybeabletodisplayimagesin'GIF'formatbutnot'JPEG'f
ormat.
iii. Oneresourceperrequest:ClientsspecifyoneresourceperHTIPrequest.Ifawebpa
gecontainsnineimages,say,thenthebrowserwillissueatotaloftenseparaterequests
toobtaintheentirecontentsofthepage.Browserstypicallymakeseveralrequestscon
currently,toreducetheoveralldelaytotheuser.
iv. Simpleaccesscontrol:Bydefault,anyuserwithnetworkconnectivitytoawebserve
rcanaccessanyofitspublishedresources.Ifuserswishtorestrictaccesstoaresource,t
hentheycanconfiguretheservertoissuea'challenge'toanyclientthatrequestsit.Thec
orrespondinguserthenhastoprovethattheyhavetherighttoaccesstheresource,fore
xample,bytypinginapassword”.
CHALLENGES
Thefollowingthevariouschallengesofdistributedsystems.
1) Heterogeneity:
TheInternetenablesuserstoaccessservicesandrunapplicationsoveraheterogeneou
scollectionofcomputersandnetworks.
Heterogeneity(thatis,varietyanddifference)appliestoallofthefollowing:
i. networks;
ii. computerhardware;
iii. operatingsystems;
iv. programminglanguages;
v. implementationsbydifferentdevelopers.
UseofMiddleware:
Thetermmiddlewareappliestoasoftwarelayerthatprovidesaprogrammingabstracti
onaswellasmaskingtheheterogeneityoftheunderlyingnetworks,hardware,operati
ngsystemsandprogramminglanguages.
Page10of88
TheCommonObjectRequestBroker(CORBA)isanexample.Somemiddleware,suc
hasJavaRemoteMethodInvocation(RMI)supportsonlyasingleprogramminglang
uage.
MostmiddlewareisimplementedovertheInternetprotocols,whichthemselvesmaskt
hedifferencesoftheunderlyingnetworks,butallmiddlewaredealswiththedifferenc
esinoperatingsystemsandhardware.
Inadditiontosolvingtheproblemsofheterogeneity,middlewareprovidesauniformco
mputationalmodelforusebytheprogrammersofserversanddistributedapplication
s.Possiblemodelsincluderemoteobjectinvocation,remoteeventnotification,remo
teSQLaccessanddistributedtransactionprocessing.
Forexample,CORBAprovidesremoteobjectinvocation,whichallowsanobjectinapr
ogramrunningononecomputertoinvokeamethodofanobjectinaprogramrunningo
nanothercomputer.
Heterogeneityandmobilecode:
Thetermmobilecodeisusedtorefertoprogramcodethatcanbetransferredfromoneco
mputertoanotherandrunatthedestination– Javaappletsareanexample.
Codesuitableforrunningononecomputerisnotnecessarilysuitableforrunningonano
therbecauseexecutableprogramsarenormallyspecificbothtotheinstructionsetand
tothehostoperatingsystem.
Thevirtualmachineapproachprovidesawayofmakingcodeexecutableonavarietyof
hostcomputers:thecompilerforaparticularlanguagegeneratescodeforavirtualma
chineinsteadofaparticularhardwareordercode.Forexample,theJavacompilerpro
ducescodeforaJavavirtualmachine,whichexecutesitbyinterpretation.
TheJavavirtualmachineneedstobeimplementedonceforeachtypeofcomputertoena
bleJavaprogramstorun.
Page11of88
2) Openness:
Theopennessofacomputersystemisthecharacteristicthatdetermineswhetherthesys
temcanbeextendedandreimplementedinvariousways.
Theopennessofdistributedsystemsisdeterminedprimarilybythedegreetowhichne
wresource-
sharingservicescanbeaddedandbemadeavailableforusebyavarietyofclientprogra
ms.
Opennesscannotbeachievedunlessthespecificationanddocumentationofthekeysof
twareinterfacesofthecomponentsofasystemaremadeavailabletosoftwaredevelop
ers.
3) Security:
Manyoftheinformationresourcesthataremadeavailableandmaintainedindistribut
edsystemsareveryimportanttotheirusers.
Theirsecurityisthereforeofconsiderableimportance.
Securityforinformationresourceshasthreecomponents:
i. Confidentiality:protectionagainstdisclosuretounauthorizedindividuals.
ii. Integrity:protectionagainstalterationorcorruption.
iii. Availability:protectionagainstinterferencewiththemeanstoaccesstheres
ources.
Inadistributedsystem,clientssendrequeststoaccessdatamanagedbyservers,whichinvolvess
endinginformationinmessagesoveranetwork.
Forexample:
i. Adoctormightrequestaccesstohospitalpatientdataorsendadditionstothatdata.
ii. Inelectroniccommerceandbanking,userssendtheircreditcardnumbersacros
stheInternet.
Inbothexamples,thechallengeistosendsensitiveinformationinamessageoveranetworki
nasecuremanner.
Page12of88
Thesecondchallengehereistoidentifyaremoteuserorotheragentcorrectly.Bothofthesech
allengescanbemetbytheuseofencryptiontechniquesdevelopedforthispurpose.However,
thefollowingtwosecuritychallengeshavenotyetbeenfullymet:
Denialofserviceattacks:Anothersecurityproblemisthatausermaywishtodisruptaservic
eforsomereason.Thiscanbeachievedbybombardingtheservicewithsuchalargenumberof
pointlessrequeststhattheserioususersareunabletouseit.Thisiscalledadenialofserviceatta
ck.
Securityofmobilecode:Mobilecodeneedstobehandledwithcare.Considersomeonewho
receivesanexecutableprogramasanelectronicmailattachment:thepossibleeffectsofrunni
ngtheprogramareunpredictable;forexample,itmayseemtodisplayaninterestingpictureb
utinrealityitmayaccesslocalresources.
4) Scalability:
Distributedsystemsoperateeffectivelyandefficientlyatmanydifferentscales,rangin
gfromasmallintranettotheInternet.
Asystemisdescribedasscalableifitwillremaineffectivewhenthereisasignificantinc
reaseinthenumberofresourcesandthenumberofusers.
ThenumberofcomputersandserversintheInternethasincreaseddramatically.
Thebelowtableshowstheincreasingnumberofcomputersandwebserversoverthepe
riodoftime.
Table1.1:GrowthoftheInternet(computersandwebservers)
Page13of88
5) Failurehandling:
Computersystemssometimesfail.
Whenfaultsoccurinhardwareorsoftware,programsmayproduceincorrectresultsor
maystopbeforetheyhavecompletedtheintendedcomputation.
Failuresinadistributedsystemarepartial–
thatis,somecomponentsfailwhileotherscontinuetofunction.
Thereforethehandlingoffailuresisparticularlydifficult.
Thefollowingtechniquesfordealingwithfailuresare:
Detectingfailures:Somefailurescanbedetected.Forexample,checksumscanbeusedtode
tectcorrupteddatainamessageorafile.
Maskingfailures:Somefailuresthathavebeendetectedcanbehiddenormadelesssevere.T
woexamplesofhidingfailures:
i. Messagescanberetransmittedwhentheyfailtoarrive.
ii. Filedatacanbewrittentoapairofdiskssothatifoneiscorrupted,theothermaystillbec
orrect.
Toleratingfailures:MostoftheservicesintheInternetdoexhibitfailures–
itwouldnotbepracticalforthemtoattempttodetectandhideallofthefailuresthatmightoccur
insuchalargenetworkwithsomanycomponents.Theirclientscanbedesignedtotoleratefail
ures,whichgenerallyinvolvestheuserstoleratingthemaswell.
Forexample,whenawebbrowsercannotcontactawebserver,itdoesnotmaketheuserwaitfo
reverwhileitkeepsontrying–
itinformstheuserabouttheproblem,leavingthemfreetotryagainlater.
Recoveryfromfailures:Recoveryinvolvesthedesignofsoftwaresothatthestate of
permanent data can be recovered or “rolled back”afteraserverhascrashed.
6) Concurrency:
Thereisthereforeapossibilitythatseveralclientswillattempttoaccessasharedresourc
eatthesametime.
Page14of88
Theprocessthatmanagesasharedresourcecouldtakeoneclientrequestatatime.
Butthatapproachlimitsthroughput.Thereforeservicesandapplicationsgenerallyall
owmultipleclientrequeststobeprocessedconcurrently.
7) Transparency:
Transparencyisdefinedastheconcealmentfromtheuserandtheapplicationprogram
meroftheseparationofcomponentsinadistributedsystem,sothatthesystemispercei
vedasawholeratherthanasacollectionofindependentcomponents.
Thetwomostimportanttransparenciesareaccessandlocationtransparency;theirpres
enceorabsencemoststronglyaffectstheutilizationofdistributedresources.
Theyaresometimesreferredtotogetherasnetworktransparency.
SYSTEMMODELS
INTRODUCTION
Systemsthatareintendedforuseinreal-
worldenvironmentsshouldbedesignedtofunctioncorrectlyinthewidestpossiblera
ngeofcircumstancesandinthefaceofmanypossibledifficultiesandthreats.
Thereisnoglobaltimeinadistributedsystem,sotheclocksondifferentcomputersdon
otnecessarilygivethesametimeasoneanother.
Allcommunicationbetweenprocessesisachievedbymeansofmessages:Messageco
mmunicationoveracomputernetworkcanbeaffectedbydelays,cansufferfromavar
ietyoffailuresandisvulnerabletosecurityattacks.
Theseissuesareaddressedbythreemodels:
1) TheinteractionmodeldealswithperformanceandwiththedifficultyofsettingtimeL
imitsinadistributedsystem,forexampleformessagedelivery.
Page15of88
2) Thefailuremodelattemptstogiveaprecisespecificationofthefaultsthatcanbe,exhib
itedbyprocessesandcommunicationchannels.Itdefinesreliable,communicationa
ndcorrectprocesses.
3) Thesecuritymodeldiscussesthepossiblethreatstoprocessesandcommunicationch
annels.
ARCHITECTURALMODELS
“Anarchitecturalmodelofadistributedsystemisconcernedwiththe
placementofitspartsandtherelationshipsbetweenthem.”
Examplesincludetheclient-servermodelandthepeer-to-peermodel.
Anarchitecturalmodelofadistributedsystemfirstsimplifiesandabstractsthefunctio
nsoftheindividualcomponentsofadistributedsystemandthenitconsiders:
⮡ Theplacementofthecomponentsacrossanetworkofcomputers
– seekingtodefineusefulpatternsforthedistributionofdataandworkload.
⮡ Theinterrelationshipsbetweenthecomponents–thatistheir
functionalrolesandthepatternsofcommunicationbetweenthem.
Softwarelayers:Thetermssoftwarearchitecturereferredoriginallytothestructuringofsof
twareaslayersorModulesinasinglecomputerorgroupofcomputers.
Platform:Thelowest-
levelhardwareandsoftwarelayersareoftenreferredtoasaplatformfordistributedsystemsa
ndapplications.Theselow-levellayersprovideservicestothelayersabovethem.
Page16of88
Middleware:Itisalayerofsoftwarewhosepurposeistomaskheterogeneityandtoprovidea
convenientprogrammingmodeltoapplicationprogrammers.Examples:CORBA,JavaR
MI.
Systemarchitectures:Thedivisionofresponsibilitiesbetweensystemcomponents(appli
cations,serversandotherprocesses)andtheplacementofthecomponentsoncomputersinth
enetworkisperhapsthemostevidentaspectofdistributedsystemdesign.
Thefollowingarethetwomaintypesofarchitecturalmodelsare:
Client-
server:Thisisthearchitecturethatismostoftencitedwhendistributedsystemsarediscussed
.Itishistoricallythemostimportantandremainsthemostwidelyemployed.Figure2.2illustr
atesthesimplestructureinwhichclientprocessesinteractwithindividualserverprocessesin
separatehostcomputersinordertoaccessthesharedresourcesthattheymanage.
Figure2.2
Peer-to-
peer:Inthisarchitecturealloftheprocessesinvolvedinataskoractivityplaysimilarroles,int
eractingcooperativelyaspeerswithoutanydistinctionbetweenclientandserverprocesseso
rthecomputersthattheyrunon.Whiletheclient-
servermodeloffersadirectandrelativelysimpleapproachtothesharingofdataandotherreso
urces,itscalespoorly.
Page17of88
Figure2.3Adistributedapplicationbasedonthepeer-to-peerarchitecture
Variations: Several variations on the above models can be derived from the
consideration of the following factors:
Servicesprovidedbymultipleservers:Servicesmaybeimplementedasseveralserverpro
cessesinseparatehostcomputersinteractingasnecessarytoprovideaservicetoclientproces
ses(Figure2.4).Theserversmaypartitionthesetofobjectsonwhichtheserviceisbasedanddi
stributethembetweenthemselves,ortheymaymaintainreplicatedcopiesofthemonseveral
hosts.
Page18of88
Figure2.4
Proxyserversandcaches:Acacheisastoreofrecentlyuseddataobjectsthatiscloserthanth
eobjectsthemselves.Whenanewobjectisreceivedatacomputeritisaddedtothecachestore,
replacingsomeexistingobjectsifnecessary.Whenanobjectisneededbyaclientprocessthec
achingservicefirstchecksthecacheandsuppliestheobjectfromthereifanup-to-
datecopyisavailable.Ifnot,anup-la-datecopyisfetched.
Figure2.5
Mobilecode:Appletsareawell-knownandwidelyusedexampleofmobilecode-
theuserrunningabrowserselectsalinktoanappletwhosecodeisstoredonawebserver;theco
deisdownloadedtothebrowserandrunsthere,asshownbelow:
Page19of88
Anadvantageofrunningthedownloadedcodelocallyisthatitcangivegoodinteractiverespo
nsesinceitdoesnotsufferfromthedelaysorvariabilityofbandwidthassociatedwithnetwor
kcommunication.
Mobileagents:Amobileagentisarunningprogram(includingbothcodeanddata)thattrave
lsfromonecomputertoanotherinanetworkcarryingoutataskonsomeone'sbehalf,suchasc
ollectinginformation,eventuallyreturningwiththeresults.
Designrequirementsfordistributedarchitectures:Thefactorsmotivatingthedistributi
onofobjectsandprocessesinadistributedsystemarenumerousandtheirsignificanceiscons
iderable.
i. Performanceissues:Performanceissuesarisingfromthelimitedprocessingandco
mmunicationcapacitiesofcomputersandnetworksareconsideredunderthefollowi
ngsubheadings:Responsiveness&Throughput.
ii. Qualityofservice:Onceusersareprovidedwiththefunctionalitythattheyrequireof
aservicesuchasthefileserviceinadistributedsystem,wecangoontoaskaboutthequa
lityoftheserviceprovided.Themainnon-
functionalpropertiesofsystemsthataffectthequalityoftheserviceexperiencedbycl
ientsandusersarereliability,securityandperformance.
iii. Useofcachingandreplication:Thecachesandwebproxyservers,withoutdiscussi
nghowcachedcopiesofresourcescanbekeptuptodatewhentheresourceataserveris
updated.Abrowserorproxycanvalidateacachedresponsebycheckingwiththeorigi
nalwebserverto
Page20of88
seewhetheritisstilluptodate.Ifitfailsthetest,thewebserverreturnsafreshresponse,
whichiscachedinsteadofthestaleresponse.
iv. Dependabilityissues:Dependabilityisarequirementinmostapplicationdomains.
Wedefinedthedependabilityofcomputersystemsascorrectness,securityandfaultt
olerance.Dependableapplicationsshouldcontinuetofunctioncorrectlyintheprese
nceoffaultsinhardware,softwareandnetworks.
FUNDAMENTALMODELS
Theaspectsofdistributedsystemsthatwewishtocaptureinourfundamentalmodelsareinte
ndedtohelpustodiscussandreasonabout:
Interaction:Computationoccurswithinprocesses;theprocessesinteractbypassingmessa
ges,resultingincommunication(i.e.,informationflow)andcoordination(synchronization
andorderingofactivities)betweenprocesses.
Failure:Thecorrectoperationofadistributedsystemisthreatenedwheneverafaultoccursi
nanyofthecomputersonwhichitruns.
Security:Themodularnatureofdistributedsystemsandtheiropennessexposesthemtoatta
ckbybothexternalandinternalagents.
Interactionmodel:Thediscussionofsystemarchitecturesindicatesthatdistributedsyste
msarecomposedofmanyprocesses,interactingincomplexways.Forexample:
Multipleserverprocessesmaycooperatewithoneanothertoprovideaservice.
AsetofpeerprocessesmaycooperatewithoneanothertoachieveacommonGoal.
Interactingprocessesperformalloftheactivityinadistributedsystem.Eachprocesshasitso
wnstate,consistingofthesetofdatathatitcanaccessandupdate,includingthevariablesinits
program.Thestatebelongingtoeachprocessiscompletelyprivate–
thatis,itcannotbeaccessedorupdatedbyanyotherprocess.
Page21of88
Inthissection,wediscusstwosignificantfactorsaffectinginteractingprocessesinadistribut
edsystem:
1) Communicationperformanceisoftenalimitingcharacteristic;
2) Itisimpossibletomaintainasingleglobalnotionoftime.
Performanceofcommunicationchannels:Thecommunicationchannelsinourmodelare
realizedinavarietyofwaysindistributedsystems,forexamplebyanimplementationofstrea
msorbysimplemessagepassingoveracomputernetwork.
Communicationoveracomputernetworkhasthefollowingperformancecharacteristicsrel
atingtolatency,bandwidthandjitter:
1) Latency:Thedelaybetweenthestartofamessage'Stransmissionfromoneprocessa
ndthebeginningofitsreceiptbyanotherisreferredtoaslatency.
2) Bandwidth:Thebandwidthofacomputernetworkisthetotalamountofinformation
thatcanbetransmittedoveritinagiventime.Whenalargenumberofcommunication
channelsareusingthesamenetwork,theyhavetosharetheavailablebandwidth.
3) Jitter:TheJitteristhevariationinthetimetakentodeliveraseriesofmessages.
Computerclocksandtimingevents:Eachcomputerinadistributedsystemhasitsowninte
rnalclock,whichcanbeusedbylocalprocessestoobtainthevalueofthecurrenttime.Therefo
retwoprocessesrunningondifferentcomputerscanassociatetimestampswiththeirevents.
Failuremodel:Inadistributedsystembothprocessesandcommunicationchannelsmayfail
-
thatis,theymaydepartfromwhatisconsideredtobecorrectordesirablebehavior.Thesearep
resentedundertheheadingsomissionfailures,arbitraryfailuresandtimingfailures.
1) Omissionfailures:Thefaultsclassifiedasomissionfailuresrefertocaseswhenaprocess
orcommunicationchannelfailstoperformactionsthatitissupposedtodo.
Page22of88
i. Processomissionfailures:Thechiefomissionfailureofaprocessistocrash.Whenw
esaythataprocesshascrashedwemeanthatithashaltedandwillnotexecuteanyfurthe
rstepsofitsprogramever.
ii. Timeouts:thatis,amethodinwhichoneprocessallowsafixedperiodoftimeforsome
thingtooccur.Inanasynchronoussystematimeoutcanindicateonlythataprocessisn
otresponding-
itmayhavecrashedormaybeslow,orthemessagesmaynothavearrived.
2) Arbitraryfailures:
Thetermarbitraryfailureisusedtodescribetheworstpossiblefailuresemantics,inwhi
chanytypeofanormayoccur.Forexample,aprocessmaysetwrongvaluesinitsdatait
ems,oritmayreturnawrongvalueinresponsetoaninvocation.
Anarbitraryfailureofaprocessisoneinwhichitarbitrarilyomitsintendedprocessingst
epsortakesunintendedprocessingsteps.Thereforearbitraryfailuresinprocessesca
nnotbedetectedbyseeingwhethertheprocessrespondstoinvocationsbecauseitmig
htarbitrarilyomittoreply.
3) Timingfailures:
Timingfailuresareapplicableinsynchronousdistributedsystemswheretimelimitsar
esetonprocessexecutiontime,messagedeliverytimeandclockdriftrate.
Inansynchronousdistributedsystem,anoverloadedservermayrespondtooslowly,bu
twecannotsaythatithasatimingfailuresincenoguaranteehasbeenoffered.
Securitymodel:Weidentifiedthesharingofresourcesasamotivatingfactorfordistributed
systems.Thesecurityofadistributedsystemcanbeachievedbysecuringtheprocessesandth
echannelsusedfortheirinteractionsandbyprotectingtheobjectsthattheyencapsulateagain
stunauthorizedaccess.Protectionisdescribedintermsofobjects,althoughtheconceptsappl
yequallywelltoresourcesofalltypes.
Page23of88
Protectingobjects:Figure2.12showsaserverthatmanagesacollectionofobjectsonbehalf
ofsomeusers.
Figure2.12
Theuserscanrunclientprogramsthatsendinvocationstotheservertoperformoperatio
nsontheobjects.
Theservercarriesouttheoperationspecifiedineachinvocationandsendstheresulttoth
eclient.
Theserverisresponsibleforverifyingtheidentityoftheprincipalbehindeachinvocati
onandcheckingthattheyhavesufficientaccessrightstoperformtherequestedoperat
ionontheparticularobjectinvoked,rejectingthosethatdonot.
Theclientmaychecktheidentityoftheprincipalbehindtheservertoensurethattheresu
ltcomesfromtherequiredserver.
Securingprocessesandtheirinteractions:
Processesinteractbysendingmessages.
Themessagesareexposedtoattackbecausethenetworkandthecommunicationservic
ethattheyuseisopen,toenableanypairofprocessestointeract.
Theenemy:Tomodelsecuritythreats,wepostulateanenemy(sometimesalsoknownasthe
adversary)thatiscapableofsendinganymessagetoanyprocessandreadingorcopyinganym
essagebetweenapairofprocesses,asshowninFigure2.13.
Page24of88
Figure2.13-Theenemy
i. Servers:Sinceaservercanreceiveinvocationsfrommanydifferentclients,itcannot
necessarilydeterminetheidentityoftheprincipalbehindanyparticularinvocation.E
venifaserverrequirestheinclusionoftheprincipal'sidentityineachinvocation,anen
emymightgenerateaninvocationwithafalseidentity.Withoutreliableknowledgeo
fthesender'sidentity,aservercannottellwhethertoperformtheoperationortorejectit
.
ii. Clients:Whenaclientreceivestheresultofaninvocationfromaserver,itcannotnece
ssarilytellwhetherthesourceoftheresultmessageisfromtheintendedserverorfrom
anenemy,perhaps'spoofing'themailserver.
Cryptographyisthescienceofkeepingmessagessecure,andencryptionistheprocessofscr
amblingamessageinsuchawayastohideitscontents.Modemcryptographyisbasedonencr
yptionalgorithmsthatusesecretkeys
-largenumbersthataredifficulttoguess–
totransformdatainamannerthatcanonlybereversedwithknowledgeofthecorrespondingd
ecryptionkey.
Page25of88
NETWORKINGANDINTERNETWORKING
Distributedsystemsuselocalareanetworks,wideareanetworksandinternetworksforcom
munication.
Introduction:Thenetworksusedindistributedsystemsarcbuiltfromavarietyoftransmissi
onmedia,includingwire,cable,fibreandwirelesschannels;hardwaredevices.includingro
uters,switches,bridges,hubs,repeatersandnetworkinterfaces;andsoftwarecomponents.
Networkingissuesfordistributedsystems:
1) Performance:Thenetworkperformanceparametersthatareofprimaryinterestforo
urpurposesarcthoseaffectingthespeedwithwhichindividualmessagescanbetransf
erredbetweentwointerconnectedcomputers.Thesearethelatencyandthepointto-
pointdatatransferrate.
2) Scalability:ThepotentialfuturesizeoftheInternetiscommensuratewiththepopula
tionoftheplanet.Itisrealistictoexpectittoincludeseveralbillionnodesandhundreds
ofmillionsofactivehosts.
3) Reliability:Thereliabilityofmostphysicaltransmissionmediaisveryhigh.Whener
rorsoccurtheyareusuallyduetofailuresinthesoftwareatthesenderorreceiver(forex
ample,failurebythereceivingcomputertoacceptapacket)orbufferoverflowrathert
hanerrorsinthenetwork.
4) Security:Thefirstlevelofdefenseadoptedbymostorganizationsistoprotectitsnetw
orksandthecomputersattachedtothemwithafirewall.Afirewallcreatesaprotection
boundarybetweentheorganization'sintranetandtherestoftheInternet.
5) Mobility:Mobiledevicessuchaslaptopcomputers,PDAsandInternet-
capablemobilephonesaremovedfrequentlybetweenlocationsandreconnectedatc
onvenientnetworkconnectionpointsorevenusedwhileonthemove.TheInternetme
chanismshavebeenadaptedandextendedtosupportmobility,buttheexpectedfuture
growthintheuseofmobiledeviceswilldemandfurtherdevelopment.
Page26of88
6) Qualityofservice:Thequalityofserviceastheabilitytomeetdeadlineswhentransm
ittingandprocessingstreamsofreal-timemultimediadata.
7) Multicasting:Whilethiscanbesimulatedbysendstoseveraldestinations,thatismor
ecostlythannecessary,andmaynotexhibitthefault-
tolerancecharacteristicsrequiredbyapplications.Forthesereasons,manynetworkt
echnologiessupportthesimultaneoustransmissionofmessagestoseveralrecipients
.
Typesofnetworks:Hereweintroducethemaintypesofnetworkthatareusedtosupportdistr
ibutedsystems:personalareanetworks,localareanetworks,wideareanetworks,metropolit
anareanetworksandthewirelessvariantsofthem.InternetworkssuchastheInternetarecons
tructedfromnetworksofallthesetypes.
1) Personalareanetworks(PANs):PANsareasub-
categoryoflocalnetworksinwhichthevariousdigitaldevicescarriedbyauserarecon
nectedbyalow-
cost,lowenergynetwork.WiredPANsarenotofmuchsignificancebecausefewuser
swishtobeencumberedbyanetworkofwiresontheirperson,butwirelesspersonalar
eanetworks(WPANs)areofincreasingimportanceduetothenumberofpersonaldev
icessuchasmobilephones,PDAs,digitalcameras,musicplayersandsoonthatareno
wcarriedbymanypeople.
2) Localareanetworks(LANs):LANscarrymessagesatrelativelyhighspeedsbetwe
encomputersconnectedbyasinglecommunicationmedium,suchastwistedcopper
wire,coaxialcableoropticalfibre.Asegmentisasectionofcablethatservesadepartm
entorafloorofabuildingandmayhavemanycomputersattached.
3) Metropolitanareanetworks(MANs):Thistypeofnetworkisbasedonthehighban
dwidthcopperandfibreopticcablingrecentlyinstalledinsometownsandcitiesforth
etransmissionofvideo,voiceandotherdataoverdistancesofupto50kilometers.
4) Wideareanetworks(WANs):WANscarrymessagesatlowerspeedsbetweennode
sthatareoftenindifferentorganizationsandmaybe
Page27of88
separatedbylargedistances.Theymaybelocatedindifferentcities,countriesorconti
nents.Thecommunicationmediumisasetofcommunicationcircuitslinkingasetofd
edicatedcomputerscalledrouters.
5) Wirelesslocalareanetworks(WLANs):WLANsaredesignedforuseinplaceofwi
redLANstoprovideconnectivityformobiledevicesorsimplytoremovetheneedfor
awiredinfrastructuretoconnectcomputerswithinhomesandofficebuildingstoeach
otherandtheInternet.
6) Wirelessmetropolitanareanetworks(WMANs):ThelEEE802.16WiMAXstan
dardistargetedatthisclassofnetwork.
7) Wirelesswideareanetworks(WWANs):Mostmobilephonenetworksarebasedo
ndigitalwirelessnetworktechnologiessuchastheGSM(GlobalSystemforMobilec
ommunication)standard,whichisusedinmostcountriesoftheworld.Mobilephone
networksaredesignedtooperateoverwideareas(typicallyentirecountriesorcontine
nts)throughtheuseofcellularradioconnections.
Internetworks:Aninternetworkisacommunicationsubsysteminwhichseveralnetworks
arelinkedtogethertoprovidecommondatacommunicationfacilitiesthatoverlaythetechno
logiesandprotocolsoftheindividualcomponentnetworksandthemethodsusedfortheirinte
rconnection.
INTERPROCESSCOMMUNICATION
Remotemethodinvocationallowsanobjecttoinvokeamethodinanobjectinaremotep
rocess.
ExamplesofsystemsforremoteinvocationareCORBAandJavaRMI.
Inasimilarway,aremoteprocedurecallallowsaclienttocallaprocedureinaremoteser
ver.
Message-
passingoperationscanbeusedtoconstructprotocolstosupportparticularprocessrol
esandcommunicationpatterns.Forexampleremotemethodinvocations.
Thecharacteristicsofinterprocesscommunication:Messagepassingbetweenapairofp
rocessescanbesupportedbytwomessagecommunicationoperations:sendandreceive,defi
nedintermsofdestinationsandmessages.
Page28of88
Inorderforoneprocesstocommunicatewithanother,oneprocesssendsamessage(asequen
ceofbytes)toadestinationandanotherprocessatthedestinationreceivesthemessage.
Client-
servercommunication:Thisformofcommunicationisdesignedtosupporttherolesandme
ssageexchangesintypicalclient-serverinteractions.Inthenormalcase,request-
replycommunicationissynchronousbecausetheclientprocessblocksuntilthereplyarrives
fromtheserver.Itcanalsobereliablebecausethereplyfromtheserveriseffectivelyanackno
wledgementtotheclient.
Therequest-
replyprotocolisbasedonatrioofcommunicationprimitives:doOperation,getRequestandsend
Reply,asshowninabovefigure.
ThedoOperationmethodisusedbyclientstoinvokeremoteoperations.GetRequestisuse
dbyaserverprocesstoacquireservicerequests.sendReplytosendthereplymessagetothe
client.
RPCexchangeprotocols:
1) therequest(R)protocol;
2) therequest-reply(RR)protocol;
3) therequest-reply-acknowledgereply(RRA)protocol.
Groupcommunication:Thepairwiseexchangeofmessagesisnotthebestmodelforcommuni
cationfromoneprocesstoagroupofotherprocesses,as
Page29of88
forexamplewhenaserviceisimplementedasanumberofdifferentprocessesindifferentco
mputers.
DISTRIBUTEDOBJECTSANDREMOTEINVOCATIONINTRODU
CTION
Thischapterisconcernedwithprogrammingmodelsfordistributedapplications–
thatis,thoseapplications(thatarecomposedofcooperatingprogramsrunninginseveraldiff
erentprocesses.Suchprogramsneedtobeabletoinvokeoperationsinotherprocesses,oftenf
unningindifferentcomputers.
Thefollowingmodelsaredevelopedbasedontheaboveconcept.
Theearliestandperhapsthebest-
knownofthesewastheextensionoftheconventionalprocedurecallmodeltotheremo
teprocedurecallmodel,whichallowsclientprogramstocallproceduresinserverpro
gramsrunninginseparateprocessesandgenerallyindifferentcomputersfromthecli
ent.
Inthe1990s,theobject-
basedprogrammingmodelwasextendedtoallowobjectsindifferentprocessestoco
mmunicatewithoneanotherbymeansofremotemethodinvocation(RMJ).RMIisan
extensionoflocalmethodinvocationthatallowsanobjectlivinginoneprocess1.0inv
okethemethodsofanobjectlivinginanotherprocess.
Theevent-
basedprogrammingmodelallowsobjectstoreceivenotificationoftheeventsatother
objectsinwhichtheyhaveregisteredinterest.
COMMUNICATIONBETWEENDISTRIBUTEDOBJECTS
Theobject-
basedmodelforadistributedsystemextendsthemodelsupportedbyobjectorientedprogra
mminglanguagestomakeitapplytodistributedobjects.
Theobjectmodel:Abriefreviewoftherelevantaspectsoftheobjectmodel.Suitableforther
eaderwithabasicknowledgeofanobject-
orientedprogramminglanguage,forexampleJavaorC++.
Page30of88
Distributedobjects:Apresentationofobject-
baseddistributedsystems,whicharguesthattheobjectmodelisveryappropriatefordistribut
edsystems.
Thedistributedobjectmodel:Adiscussionoftheextensionstotheobjectmodelnecessary
forittosupportdistributedobjects.
Designissues:Asetofargumentsaboutthedesignalternatives.
Implementation:Anexplanationastohowalayerofmiddlewareabovetherequestreplypr
otocolmaybedesignedtosupportRMIbetweenapplication-leveldistributedobjects.
Distributedgarbagecollection:Apresentationofanalgorithmfordistributedgarbagecoll
ectionthatissuitableforusewiththeRMIimplementation.
Theobjectmodel:Anobject-orientedprogram,forexampleinJavaorC+
+,consistsofacollectionofinteractingobjects,eachofwhichconsistsofasetofdataandaseto
fmethods.Anobjectcommunicateswithotherobjectsbyinvokingtheirmethods,generally
passingargumentsandreceivingresults.
Objectscanencapsulatetheirdataandthecodeoftheirmethods.Somelanguages,forexampl
eJavaandC+
+,allowprogrammerstodefineobjectswhoseinstancevariablescanbeaccesseddirectly.Bu
tforuseinadistributedobjectsystem,anobject'sdatashouldbeaccessibleonlyviaitsmethod
s.
Distributedobjects:Distributedobjectsystemsmayadopttheclient-
serverarchitecture.Inthiscase,objectsaremanagedbyserversandtheirclientsinvoketheir
methodsusingremotemethodinvocation.InRMl,theclient'srequesttoinvokeamethodofa
nobjectissentinamessagetotheservermanagingtheobject.Theinvocationiscarriedoutbye
xecutingamethodoftheobjectattheserverandtheresultisreturnedtotheclientinanotherme
ssage.Toallowforchainsofrelatedinvocations,objectsinserversareallowedtobecomeclie
ntsofobjectsinotherservers.
Thedistributedobjectmodel:Eachprocesscontainsacollectionofobjects,someofwhich
canreceivebothlocalandremoteinvocations,whereastheotherobjectscanreceiveonlyloca
linvocations,asshowninbelowFigure.
Page31of88
Methodinvocationsbetweenobjectsindifferentprocesses,whetherinthesamecomputeror
not,areknownasremotemethodinvocations.Methodinvocationsbetweenobjectsinthesa
meprocessarclocalmethodinvocations.Werefertoobjectsthatcanreceiveremoteinvocati
onsasremoteobjects.InbelowFiguretheobjectsBandFareremoteobjects.
DesignIssuesforRMI:TheprevioussectionsuggestedthatRMIisanaturalextensionofloc
almethodinvocation.Inthissection,wediscusstwodesignissuesthatariseinmakingthisext
ension:
1) Thechoiceofinvocationsemantics-
althoughlocalinvocationsareexecutedexactlyonce,thuscannotalwaysbethecasef
orremotemethodinvocations.
2) TheleveloftransparencythatisdesirableforRMI.
ImplementationofRMI:
Severalseparateobjectsandmodulesareinvolvedinachievingaremotemethodinvocation.The
seareshowninaboveFigure,inwhichanapplication-
levelobjectAinvokesamethodinaremoteapplication-
levelobjectBforwhichitholdsaremoteobjectreference.
Page32of88
Distributedgarbagecollection:Theaimofadistributedgarbagecollectoristoensurethatif
alocalorremotereferencetoanobjectisstillheldanywhereinasetofdistributedobjects,thent
heobjectitselfwillcontinuetoexist,butassoonasnoobjectanylongerholdsareferencetoit,t
heobjectwillbecollectedandthememoryitusesrecovered.
REMOTEPROCEDURECALL(RPC)
Aremoteprocedurecallisverysimilartoaremotemethodinvocationinthataclientpro
gramcallsaprocedureinanotherprogramrunninginaserverprocess.
ServersmaybeclientsofotherserverstoallowchainsofRPCs.
Asmentionedintheintroductiontothischapter,aserverprocessdefinesinitsserviceint
erfacetheproceduresthatareavailableforcallingremotely.
Ineffect,thissonofserviceisratherlikeasingleremoteobjectinthatithasstateandmeth
ods.
However,itlackstheabilitytocreatenewinstancesofobjectsandthereforedoesnotsup
portremoteobjectreferences.
Asshownintheabovefigure,Theclientthataccessesaserviceincludesonestubproced
ureforeachprocedureintheserviceinterface.Theroleofastubprocedureissimilartot
hatofaproxymethod.
Itbehaveslikealocalproceduretotheclient,butinsteadofexecutingthecall,itmarshal
stheprocedureidentifierandtheargumentsintoarequestmessage,whichitsendsviai
tscommunicationmoduletotheserver.
Page33of88
Whenthereplymessagearrives,itunmarshalstheresults.Theserverprocesscontainsa
dispatchertogetherwithoneserverstubprocedureandoneserviceprocedureforeach
procedureintheserviceinterface.
Thedispatcherselectsoneoftheserverstubproceduresaccordingtotheprocedureiden
tifierintherequestmessage.
Aserverstubprocedureislikeaskeletonmethodinthatitunmarshalstheargumentsinth
erequestmessage,callsthecorrespondingserviceprocedureandmarshalsthereturn
valuesforthereplymessage.
EVENTSANDNOTIFICATIONS
Theideabehindtheuseofeventsisthatoneobjectcanreacttoachangeoccurringinanot
herobject.
Notificationsofeventsarcessentiallyasynchronousanddeterminedbytheirreceivers
.
Inparticularininteractiveapplications,theactionsthattheuserperformsonobjects.Fo
rexample,bymanipulatingabuttonwiththemouseorenteringtextinatextboxviathe
keyboard,areseenaseventsthatcausechangesintheobjectsthatmaintainthestateoft
heapplication.Theobjectsthatareresponsiblefordisplayingaviewofthecurrentstar
earenotifiedwheneverthestatechanges.
Distributedevent-
basedsystemsextendthelocaleventmodelbyallowingmultipleobjectsatdifferentl
ocationstobenotifiedofeventstakingplaceatanobject.Theyusethepublish-
subscribeparadigm,inwhichanobjectthatgenerateseventspublishesthetypeofeve
ntsthatitwillmakeavailableforobservationbyotherobjects.Objectsthatwanttorece
ivenotificationsfromanobjectthathaspublisheditseventssubscribetothetypesofev
entsthatareofinteresttothem.Objectsthatrepresenteventsarecallednotifications.
Notificationsmaybestored,sentinmessages,queriedandappliedinavarietyoforderst
odifferentthings.Whenapublisherexperiencesanevent,subscribersthatexpressed
aninterestinthattypeofeventwillreceivenotifications.Subscribingtoaparticularty
peofeventisalsocalledregisteringinterestinthattypeofevent.
Page34of88
CASESTUDY:JAVARMI
JavaRMIextendstheJavaobjectmodeltoprovidesupportfordistributedobjectsintheJaval
anguage.Inparticular,itallowsobjectstoinvokemethodsonremoteobjectsusingthesames
yntaxasforlocalinvocations.Inaddition,typecheckingappliesequallytoremoteinvocatio
nsastolocalones.However,anobjectmakingaremoteinvocationisawarethatitstargetisre
motebecauseitmusthandleRemoteExceptions;andtheimplementorofaremoteobjectisa
warethat.itisremotebecauseitmustimplementtheRemoteinterface.Althoughthedistribut
edobjectmodelisintegratedintoJavainanaturalway,thesemanticsofparameterpassingdif
ferbecauseinvokerandtargetareremotefromoneanother.
TheprogrammingofdistributedapplicationsinJavaRMIshouldberelativelysimplebecaus
eitisasingle-languagesystem-remoteinterfacesaredefinedintheJavalanguage.
RMIregistry:TheRMIregistryisthebinderforJavaRMI.AninstanceofRMIregistrymust
runoneveryservercomputerthathostsremoteobjects.
Page35of88
UNIT-II
OPERATINGSYSTEMSUPPORT
Theoperatingsystemfacilitatestheencapsulationandprotectionofresourcesinsideservers
;anditsupportstheinvocationmechanismsrequiredtoaccesstheseresources,includingco
mmunicationandscheduling.
INTRODUCTION
Wehavelearnedthatanimportantaspectofdistributedsystemsisresourcesharing.
Clientapplicationsinvokeoperationsonresourcesthatareoftenonanothernodeoratle
astinanotherprocess.
Applications(intheformofclients)andservices(intheformofresourcemanagers)use
themiddlewarelayerfortheirinteractions.
Middlewareprovidesremoteinvocationsbetweenobjectsorprocessesatthenodesofa
distributedsystem.
Belowthemiddlewarelayeristheoperatingsystem(OS)layer,whichisthesubjectofth
ischapter.Weshallbeexaminingtherelationshipbetweenthetwo,andinparticularh
owwelltherequirementsofmiddlewarecanbemetbytheoperatingsystem.
Thetaskofanyoperatingsystemistoprovideproblem-
orientedabstractionsoftheunderlyingphysicalresources-
theprocessors,memory,communications,andstoragemedia.
THEOPERATINGSYSTEM(OS)LAYER
Page36of88
TheaboveFigureshowshowtheoperatingsystemlayerateachoftwonodessupportsacomm
onmiddlewarelayerinprovidingadistributedinfrastructureforapplicationsandservices.
OurgoalinthischapteristoexaminetheimpactofparticularOSmechanismsonmiddleware'
sabilitytodeliverdistributedresourcesharingtousers.
Werequireatleastthefollowingofthem:
1) Encapsulation:Theyshouldprovideausefulserviceinterfacetotheirresources–
thatis,asetofoperationsthatmeettheirclients'needs.Detailssuchasmanagementof
memoryanddevicesusedtoimplementresourcesshouldbehiddenfromclients.
2) Protection:Resourcesrequireprotectionfromillegitimateaccesses-
forexample,filesareprotectedfrombei.ngreadbyuserswithoutreadpermissions,an
ddeviceregistersareprotectedfromapplicationprocesses.
3) Concurrentprocessing:Clientsmayshareresourcesandaccessthemconcurrently
.Resourcemanagersareresponsibleforachievingconcurrencytransparency.
CoreOSFunctionality:
TheaboveFigureshowsthecoreasfunctionalitythatweshallbeconcernedwith:processand
threadmanagement,memorymanagement,andcommunicationbetweenprocessesonthes
amecomputer(horizontal
Page37of88
divisionsinthefiguredenotedependencies).Thekernelsuppliesmuchofthisfunctionality-
allofitinthecaseofsomeoperatingsystems.
ThecoreOScomponentsarethefollowing:
1) Processmanager:Handlesthecreationofandoperationsuponprocesses.Aprocess
isaunitorresourcemanagement.includinganaddressspaceandoneormarcthreads.
2) Threadmanager:Threadcreation,synchronizationandscheduling.Threadsaresc
hedulableactivitiesattachedtoprocesses.
3) Communicationmanager:Communicationbetween(Threadsattachedtodiffere
ntprocessesonthesamecomputer.Somekernelsalsosupportcommunicationbetwe
enthreadsinremoteprocesses.
4) Memorymanager:Managementofphysicalandvirtualmemory.
5) Supervisor:Dispatchingofinterrupts,systemcalltrapsandotherexceptions:contr
olofmemorymanagementunit.
PROTECTION
To understand what we mean by an“ illegitim access ”to a resource, consider a file.
Let us suppose,
forthesakeofexplanation,thatopenfileshaveonlytwooperations.readandwrite.
Protectingthefileconsistoftwosub-
problems.Thefirstistoensurethateachofthefilestwooperationscanbeperformedonlybycli
entswiththerighttoperformit.Forexample,Smithwhoownsthefilehasreadandwriterightst
oit.Jonesmayonlyperformthereadoperation.AnillegitimateaccessherewouldbeifJoness
omehowmanagedtoperformwriteoperationonthefile.
Kernelsandprotection:Thekernelisaprogramthatisdistinguishedbythefactsthatitalwa
ysrunsanditscodeisexecutedwithcompleteaccessprivilegesforthephysicalresourcesonit
shostcomputer.Inparticular,itcancontrolthememorymanagementunitandsettheprocess
orregisterssothatnoothercodemayaccessthemachinesphysicalresourcesexceptinaccept
ableways.
Page38of88
PROCESSESANDTHREADS
Aprocessisaprogramunderexecutionandthreadisapartoftheprocess.
Aprocessconsistsofanexecutionenvironmenttogetherwithoneormorethreads.
Anexecutionenvironmentprimarilyconsistsof:
⮡ anaddressspace;
⮡
threadsynchronizationandcommunicationresourcessuchassemaphoresa
ndcommunicationinterfaces(forexamplesockets):
⮡ higher-levelresourcessuchasopenfilesandwindows.
Threadscanbecreatedanddestroyeddynamicallyasneeded.Thecentralaimofhavingmulti
plethreadsofexecutionistomaximizethedegreeofconcurrentexecutionbetweenoperation
s.Thusenablingtheoverlapofcomputationwithinputandoutput.Andenablingconcurrentp
rocessingonmultiprocessors.
Processesareheavyweighttasksthatrequiretheirownseparateaddressspaces.
Interprocesscommunicationisexpensiveandlimited.
Contextswitchingfromoneprocesstoanotherisalsocostly.
Threads,ontheotherhand,arelightweight.
Theysharethesameaddressspaceandcooperativelysharethesameheavyweightproc
ess.Interthreadcommunicationisinexpensive,andcontextswitchingfromonethrea
dtothenextislowcost.
COMMUNICATIONANDINVOCATION
Hereweconcentrateoncommunicationaspartoftheimplementationofwhatwehavecalled
aninvocation-
aconstructsuchasaremotemethodinvocation,remoteprocedurecalloreventnotification.
Weshallcoveroperatingsystemdesignissuesandconceptsbyaskingthefollowingquestion
sabouttheOS:
1) Whatcommunicationprimitivesdoesitsupply?
Page39of88
2) WhichprotocolsdoesitSupportandhowopenisthecommunicationimplementation
'?
3) Whatstepsaretakentomakecommunicationasefficientaspossible?
4) Whatsupportisprovidedforhigh-latencyanddisconnectedoperation?
Communicationprimitives:Therearethreecommunicationprimitives:doOperation,get
RequestandsendReply.Alltheseprimitivesareusedtoestablishcommunicationbetweenv
ariousclientsandservers.
Protocolsandopenness:Oneofthemainrequirementsoftheoperatingsystemistoprovides
tandardprotocolsthatenableinterworkingbetweenmiddlewareimplementationsondiffer
entplatforms.
Invocationperformance:Invocationperformanceisacriticalfactorindistributedsystem
design.Themoredesignersseparatefunctionalitybetweenaddressspaces.themoreremotei
nvocationsarerequired.Clientsandserversmaymakemanymillionsofinvocation-
relatedoperationsintheirlifetimes.sothatsmallfunctionsofmillisecondscountininvocatio
ncosts.Networktechnologiescontinuetoimprove.butinvocationtimeshavenotdecreasedi
nproportionwithincreasesinnetworkbandwidth.
RPCandRMIimplementationshavebeenthesubjectofstudybecauseofthewidespreadacc
eptanceofthesemechanismsforgeneral-purposeclient–
serverprocessing.Muchoftheresearchhasbeencarriedoutintoinvocationsoverthenetwor
kandparticularlyintohowinvocationmechanismscantakeadvantageofhighperformance
networks.
OPERATINGSYSTEMARCHITECTURE
Inthissection,weexaminethearchitectureofakernelsuitableforadistributedsystem.Wead
optafirst-
principlesapproachofstartingwiththerequirementofopennessandexaminingthemajorke
rnelarchitecturesthathavebeenproposedIdeally,thekernelwouldprovideonlythemostbas
icmechanismsuponwhichthegeneralresourcemanagementtasksalanodearccarriedout.
TypesofKernels(Monolithickernelsandmicrokernels):Therearetwokeyexamplesof
kerneldesign:theso-calledmonolithicandmicrokernel
approaches.Wherethesedesignsdifferprimarilyisinthedecisionas10whatfunctionalitybelon
gsinthekernelandwhatistobelefttoserverprocessesthatcanbedynamicallyloadedtorunontop
ofit.Althoughmicrokernelshavenotbeendeployedwidely.
TheUNIXoperatingsystemkernelhasbeencalledmonolithic.
Bycontrastinthecaseofamicrokerneldesignthekernelprovidesonlythemostbasicabstrac
tions,principallyaddressspaces,threadsandlocalinterprocesscommunication:allothersy
stemservicesareprovidedbyservers.
DISTRIBUTEDFILESYSTEMS
Adistributedfilesystemenablesprogramstostoreandaccessremotefilesexactlyasth
eydolocalones,allowinguserstoaccessfilesfromanycomputeronanetwork.
Filesystemswereoriginallydevelopedforcentralizedcomputersystemsanddesktop
computersasanoperatingsystemfacilityprovidingaconvenientprogramminginter
facetodiskstorage.
Theysubsequentlyacquiredfeaturessuchasaccesscontrolandfile-
lockingmechanismsthatmadethemusefulforthesharingofdataandprograms.
Distributedfilesystemssupportthesharingofinformationintheformoffilesandhard
wareresourcesintheformofpersistentstoragethroughoutanintranet.
Awelldesignedtileserviceprovidesaccesstofilesstoredataserverwithperformanceandreliabilitysimilar
toandinsomecasesbetterthanfilesstoredonlocaldisks
Filesystemmodules:
Fileattributerecordstructure:
Files contain both data and attributes. The data consist of a sequence of data items (typically 8-bit
bytes), accessible by operations to read and write any portion of the sequence. The attributes are held
as a single record containing information such as the length of the file, timestamps, file type, owner’s
identity and access control lists. A typical attribute record structure is illustrated in Figure 12.3. The
shaded attributes are managed by the file system and are not normally updatable by user programs.
Characteristicsoffilesystems:
1) Filesystemsareresponsiblefortheorganization,Storage,retrieval,naming,sharing
andprotectionoffiles.
2) Filesarestoredondisksorothernon-volatilestoragemedia.
3) Filescontainbothdateandattributes.
4) Thedataconsistofasequenceofdataitems(typically8-
bitbytes)accessiblebyoperationstoreadandwriteanyportionofthesequence.
5) Theattributesarcheldasasinglerecordcontaininginformationsuchasthelengthofth
efile,Timestamps,filetype,owner'sidentityandaccess-controllists.
6) Atypicalattributerecordstructureisillustratedinpreviousfigure.
7) Theshadedattributesaremanagedbythefilesystemandarenotnormallyupdatableb
yuserprograms.
8) Filesystemsaredesignedtostoreandmanagelargenumbersoffileswithfacilitiesfor
creating,naminganddeletingfiles.
9) Adirectoryisafileoftenofaspecialtypethatprovidesamappingfromtextnamestoint
ernalfileidentifiers;Directoriesmayincludethenamesofotherdirectories.
Distributedfilesystemrequirements:Manyotherrequirementsandpotentialpitfallsinth
edesignofdistributedserviceswerefirstobservedintheearlydevelopmentofdistributedfile
systems.
1) Transparency:Thefileserviceisusuallythemostheavilyloadedserviceinanintranet,s
oitsfunctionalityandperformancearecritical.Thedesignof
Page42of88
thefileserviceshouldsupportmanyofthetransparencyrequirementsfordistributedsystem
sThefollowingformsoftransparencyarepartiallyorwhollyaddressedbycurrentfileservice
s:
i. Accesstransparency:Clientprogramsshouldbeunawareofthedistributionoffiles
.Asinglesetofoperationsisprovidedforaccesstolocalandremotefiles.Programswri
ttentooperateonlocalfilesareabletoaccessremotefileswithoutmodification.
ii. Locationtransparency:Clientprogramsshouldseeauniformfilenamespace.Files
orgroupsoffilesmayberelocatedwithoutchangingtheirpathnames,anduserprogra
msseethesamenamespacewherevertheyareexecuted.
iii. Mobilitytransparency:Neitherclientprogramsnorsystemadministrationtablesi
nclientnodesneedtobechangedwhenfilesaremoved.Thisallowsfilemobility-
filesormorecommonly,setsorvolumesoffilesmaybemoved,eitherbysystemadmi
nistratorsorautomatically.
iv. Performancetransparency:Clientprogramsshouldcontinuetoperformsatisfact
orilywhiletheloadontheservicevarieswithinaspecifiedrange.
v. Scalingtransparency:Theservicecanbeexpandedbyincrementalgrowthtodealw
ithawiderangeofloadsandnetworksizes.
2) Concurrentfileupdates:Changestoafilebyoneclientshouldnotinterferewiththeoper
ationofotherclientssimultaneouslyaccessingorchangingthesamefile.Theneedforconcur
rencycontrolforaccesstoshareddatainmanyapplicationsiswidelyacceptedandtechnique
sareknownforitsimplementation,buttheyarecostly.
3) Filereplication:Inafileservicethatsupportsreplication,afilemayberepresentedbysev
eralcopiesofitscontentsatdifferentlocations.Thishastwobenefits-
itenablesmultipleserverstosharetheloadofprovidingaservicetoclientsaccessingthesame
setoffiles,enhancingthescalabilityoftheservice,anditenhancesfaulttolerancebyenabling
clientstolocateanotherserverthatholdsacopyofthefilewhenonehasfailed.
Page43of88
4) Hardwareandoperatingsystemheterogeneity:Theserviceinterfacesshouldbedefi
nedsothatclientandserversoftwarecanbeimplementedfordifferentoperatingsystemsand
computers.Thisrequirementisanimportantaspectofopenness.
5) Faulttolerance:Thecentralroleofthefileserviceindistributedsystemsmakesitessenti
althattheservicecontinuetooperateinthefaceofclientandserverfailures.
6) Consistency:Thisreferstoamodelforconcurrentaccesstofilesinwhichthefilecontents
seenbyallortheprocessesaccessingorupdatingagivenfilearethosethattheywouldseeifonl
yasinglecopyofthefilecontentsexisted.
7) Security:Virtuallyallfilesystemsprovideaccesscontrolmechanismsbasedontheuseo
faccesscontrollists.Indistributedfilesystems,thereisaneedtoauthenticateclientrequestss
othataccesscontrolattheserverisbasedoncorrectuseridentitiesandtoprotectthecontentso
frequestandreplymessageswithdigitalsignaturesand(optionally)encryptionofsecretdata
.
8) Efficiency:Adistributedfileserviceshouldofferfacilitiesthatareofatleastthesamepo
werandgeneralityasthosefoundinconventionalfilesystemsandshouldachieveacompara
blelevelofperformance.
FILESERVICEARCHITECTURE
Anarchitecturethatoffersaclearseparationofthemainconcernsinprovidingaccesstofilesi
sobtainedbystructuringthefileserviceasthreecomponents–
aflatfileservice,adirectoryserviceandaclientmodule.
Thedivisionofresponsibilitiesbetweenthemodulescanbedefinedasfollows:
1) Flatfileservice:Theflatfileserviceisconcernedwithimplementingoperationsonth
econtentsoffiles.UniqueFileIdentifiers(UFIDs)areusedtorefertofilesinallrequest
sforthatfileserviceoperations.Thedivisionofresponsibilitiesbetweenthefileservi
ceandthedirectoryserviceisbasedupontheuseofUFIDs.
Page44of88
2) Directoryservice:Thedirectoryserviceprovidesamappingbetweentextnamesfor
filesandtheirUFIDs.ClientsmayobtaintheUFIDofafilebyquotingitstextnametoth
edirectoryservice.
3) Clientmodule:Aclientmodulerunsineachclientcomputerintegratingandextendi
ngtheoperationsoftheflatfileserviceandthedirectoryserviceunderasingleapplicat
ionprogramminginterfacethatisavailabletouser-
levelprogramsinclientcomputers.
CASESTUDY:SUNNETWORKFILESYSTEM
ThebelowfigureshowsthearchitectureofSunNFS.Itfollowstheabstractmodeldefinedint
heprecedingsection.AllimplementationsofNFSsupporttheNFSprotocol-
asetofremoteprocedurecallsthatprovidethemeansforclientstoperformoperationsonare
motefilestore.
TheNFSservermoduleresidesinthekerneloneachcomputerthatactsasanNFSserver.Req
uestsreferringtofilesinaremotefilesystemaretranslatedbytheclientmoduletoNFSprotoc
oloperationsandthenpassedtotheNFSservermoduleatthecomputerholdingtherelevantfil
esystem.
TheNFSclientandservermodulescommunicateusingremoteprocedurecallingSun'sRPC
systemwasdevelopedforuseinNFS.ItcanbeconfiguredtouseeitherUDPorTCP,andtheN
FSprotocoliscompatiblewithboth.Aportmapperserviceisincludedtoenableclientstobind
toservicesinagivenhostbyname.TheRPCinterfacetotheNFSserverisopen:anyprocessca
n
sendrequeststoanNFSserver;iftherequestsarevalidandtheyincludevalidusercredentials.
theywillbeactedupon.Thesubmissionofsignedusercredentialscanberequiredasanoption
alsecurityfeature,ascantheencryptionofdataforprivacyandintegrity.
NFSadoptstheUNIXmountablefilesystemastheunitoffilegroupingdefinedintheprecedi
ngsection.
(Terminologynote:thesinglewordfilesystemreferstothesetoffilesheldinastoragedeviceo
rpartition.whereasthewordfilesystemrefertoasoftwarecomponentthatprovidesaccesstof
iles.)Thefilesystemidentifierfieldisauniquenumberthatisallocatedtoeachfilesystemwhe
nitiscreated(andintheUNIXimplementationisstoredinthesuperblockofthefilesystem).T
hei-nodegenerationnumberisneededbecauseintheconventionalUNIXfilesystemi-
nodenumbersarereusedafterafileisremoved.
*****
UNIT–III
PEERTOPEERSYSTEMS
INTRODUCTIO
N
ThedemandforservicesintheInternetcanbeexpectedtogrowtoascalethatislimitedo
nlybythesizeoftheworld’spopulation.
Thegoalofpeer-to-
peersystemsistoenablethesharingofdataandresourcesonaverylargescalebyelimi
natinganyrequirementforseparately-
managedserversandtheirassociatedinfrastructure.
Peer-to-
peersystemsaimtosupportusefuldistributedservicesandapplicationsusingdataan
dcomputingresourcesavailableinthepersonalcomputersandworkstationsthatarep
resentontheInternetandothernetworksinever-increasingnumbers.
Traditionalclient-
serversystemsmanageandprovideaccesstoresourcessuchasfiles,webpagesorothe
rinformationobjectslocatedonasingleservercomputerorasmallclusterortightly-
coupledservers.Withsuchcentralizeddesignsfewdecisionsarerequiredaboutthepl
acementortheresourcesorthemanagementofserverhardwareresources,butthescal
eoftheserviceislimitedbytheserverhardwarecapacityandnetworkconnectivity.
Peer-to-
peersystemsprovideaccesstoinformationresourceslocatedoncomputersthrougho
utanetwork(whetheritbetheInternetoracorporatenetwork).
NAPSTERANDITSLEGACY
Thefirstapplicationinwhichademandforaglobally-
scalableinformationstorageandretrievalserviceemergedwasthedownloadingofdi
gitalmusicfiles.
Boththeneedandthefeasibilityofapeer-to-
peersolutionwerefirstdemonstratedbytheNapsterfilesharingsystemwhichprovid
edameansforuserstosharefiles.
Page47of88
Napsterbecameverypopularformusicexchangesoonafteritslaunchin1999.Atitspea
k,severalmillionuserswereregisteredandthousandswereswappingmusicfilessim
ultaneously.
Napster'sarchitectureincludedcentralizedindexesbutuserssuppliedthefiles,which
werestoredandaccessedontheirpersonalcomputers.
Napster'smethodoroperationisillustratedbythesequenceofstepsshowninaboveFig
ure.
Notethatinstep5clientsareexpectedtoaddtheirownmusicfilestothepoolofsharedres
ourcesbytransmittingalinktotheNapsterindexingserviceforeachavailablefile.
ThusthemotivationforNapsterandthekeytoitssuccesswastomakealargewidely-
distributedsetoffilesavailabletousersthroughouttheInternet.
PEERTOPEERMIDDLEWARE
Akeyprobleminthedesignofpeer-to-
peerapplicationsistoprovideamechanismtoenableclientstoaccessdataresourcesq
uicklyanddependablywherevertheyarelocatedthroughoutthenetwork.
Page48of88
Napstermaintainedaunifiedindexofavailablefilesforthispurposegivingthenetwor
kaddressesoftheirhosts.
Peer-to-
peermiddlewaresystemsaredesignedspecificallytomeettheneedfortheautomatic
placementandsubsequentlocationofthedistributedobjectsmanagedbypeer-to-
peersystemsandapplications.
Functionalrequirements:Thefunctionofpeer-to-
peermiddlewareistosimplifytheconstructionofservicesthatareimplementedacrossmany
hostsinawidelydistributednetwork.Toachievethisitmustenableclientstolocateandcomm
unicatewithanyindividualresourcemadeavailabletoaserviceeventhoughtheresourcesare
widelydistributedamongstthehosts.
Non-functionalrequirements:Toperformeffectivelypeer-to-
peermiddlewaremustalsoaddressthefollowingnon-functionalrequirements.
Globalscalability:Oneoftheaimsofpeer-to-
peerapplicationsistoexploitthehardwareresourcesofverylargenumbersofhostsconnecte
dtotheInternet.Peer-
topeermiddlewaremustthereforebedesignedtosupportapplicationsthataccessmillionsof
objectsontensofthousandsorhundredsofthousandsofhosts.
Loadbalancing:Theperformanceofanysystemdesignedtoexploitalargenumberofcomp
utersdependsuponthebalanceddistributionofworkloadacrossthem.
Optimizationforlocalinteractionsbetweenneighbouringpeers:Thenetworkdistance
betweennodesthatinteracthasasubstantialimpactonthelatencyofindividualinteractionss
uchasclientrequestsforaccesstoresources.
Accommodatingtohighlydynamichostavailability:Mostpeer-to-
peersystemsareconstructedfromhostcomputersthatarefreetojoinorleavethesystematany
time.
Page49of88
ROUTINGOVERLAYS
Adistributedalgorithmknownasaroutingoverlaytakesresponsibilityforlocatingno
desandobjects.
Thenamedenotesthefactthatthemiddlewaretakestheformofalayerthatisresponsibl
eforroutingrequestsfromanyclienttoahostthatholdstheobjecttowhichtherequesti
saddressed.
Theobjectsofinterestmaybeplacedandsubsequentlyrelocatedtoanynodeinthenetw
orkwithoutclientinvolvement.
Itistermedanoverlaysinceitimplementsaroutingmechanismintheapplicationlayert
hatisquiteseparatefromanyotherroutingmechanismsdeployedatthenetworklevel
suchasIProuting.
Theroutingoverlayensuresthatanynodecanaccessanyobjectbyroutingeachrequestt
hroughasequenceofnodes,exploitingknowledgeateachofthemtolocatethedestina
tionobjectPeer-to-
peersystemsusuallystoremultiplereplicasofobjectstoensureavailability.
Inthatcase,theroutingoverlaymaintainsknowledgeofthelocationofalltheavailabler
eplicasanddeliversrequeststothenearest'live'node(i.e.onethathasnotfailed)thatha
sacopyoftherelevantobject.
Page50of88
Themaintaskofaroutingoverlayisthefollowing:
⮡Aclientwishingtoinvokeanoperationonanobjectsubmitsarequestincludingt
heobject'sGUIDtotheroutingoverlaywhichroutestherequesttoanodeatwhi
chareplicaoftheobjectresides.
OVERLAYCASESTUDIES:PASTRY&TAPESTRY
TheprefixroutingapproachisadoptedbybothPastryandTapestry.Pastryhasastraightforw
ardbuteffectivedesignwhichmakesitagoodfirstexampleforustostudyindetail.Pastryisth
emessageroutinginfrastructuredeployedinseveralapplications.
Pastry:AllthenodesandobjectsthatcanbeaccessedthroughPastryareassigned128-
bitGUIDs.Fornodes,thesearecomputedbyapplyingasecurehashfunction(suchasSHA)to
thepublickeywithwhicheachnodeisprovided.ForobjectssuchasfilestheGUIDiscompute
dbyapplyingasecurehashfunctiontotheobject'snameortosomepartoftheobject'sstoredsta
te.TheresultingGUIDhastheusualpropertiesofsecurehashvalues.
InanetworkwithNparticipatingnodes,thePastryroutingalgorithmwillcorrectlyrouteame
ssageaddressedtoanyGUIDinO(logN)steps.IftheGUIDidentifiesanodethatiscurrentlya
ctive,themessageisdeliveredtothatnode;otherwisethemessageisdeliveredtotheactiveno
dewhoseGUIDisnumericallyclosesttoit.Activenodestakeresponsibilityforprocessingre
questsaddressedtoallobjectsintheirnumericalneighbourhood.
Tapestry:Tapestryimplementsadistributedhashtableandroutesmessagestonodesbased
onGUIDsassociatedwithresourcesusingprefixroutinginamannersimilartoPastry.Nodes
thatholdresourcesusethepublish(GUID)primitivetomakethemknowntoTapestry,thehol
dersofresourcesremainresponsibleforstoringthem.Replicatedresourcesarepublishedwit
hthesameGUIDbyeachnodethatholdsareplica,resultinginmultipleentriesintheTapestryr
outingstructure.
ThisgivesTapestryapplicationsadditionalflexibility:theycanplacereplicasclose(innetw
orkdistance)tofrequentusersofresourcesinordertoreduce
Page51of88
latenciesandminimizenetworkloadsortoensuretoleranceofnetworkandhostfailures.
InTapestry160-
bitidentifiersareusedtoreferbothtoobjectsandtothenodesthatperformroutingactions.Ide
ntifiersarceitherNodeId,whichrefertocomputersthatperformroutingoperationsorGUID
swhichrefertotheobjects.
APPLICATIONCASESTUDIES:SQUIRREL&OCEANSTORE
Large-scalepeer-to-
peersystemsarenotyetamainstreamtechnology.Theirwidestdeploymenthasbeeninappli
cationsforfiledownloadingbyend-
usersinsystemssuchasNapster,Freenet,Gnutella.KazaaandBitTorrent.Butthosesystems
donotemployseparateroutingoverlaylayers,soevaluationsoftheirperformancearedifficu
lttoextrapolatetootherapplications.
Theroutingoverlaylayersdescribedintheprecedingsectionhavebeenexploitedinseverala
pplicationexperimentsandtheresultingapplicationshavebeenextensivelyevaluated.
Wehavechosentwoofthemforfurtherstudy,theSquirrelwebcachingservicebasedonPastr
y,andtheOceanStore.
Squirrelwebcache:TheauthorsofPastryhavedevelopedtheSquirrelpeer-to-
peerwebcachingserviceforuseinlocalnetworksofpersonalcomputers.Inmediumandlarg
elocalnetworkswebcachingistypicallyperformedusingadedicatedservercomputerorclu
ster.TheSquirrelsystemperformsthesametaskbyexploitingstorageandcomputingresour
cesalreadyavailableondesktopcomputersinthelocalnetwork.Wefirstgiveabriefgenerald
escriptionoftheoperationofawebcachingservice,thenweoutlinethedesignofSquirrelandr
eviewitseffectiveness.
Webcaching:WebbrowsersgenerateHTTPGETrequestsforInternetobjectslikeHTMLp
ages,imagesetc.Thesemaybeservicedfromabrowsercacheontheclientmachine,fromapr
oxywebcache-
aservicerunningonanothercomputerinthesamelocalnetworkoronanearbynodeintheInte
rnetorfromtheoriginwebserver– theserverwhosedomainflameisincludedinthe
Page52of88
parametersoftheGETrequest-dependingonwhichcontainsafreshcopyoftheobject.
Thelocalandproxycacheseachcontainsasetofrecently-
retrievedobjectsorganizedforfastlookupbyURL.Someobjectsareunreachablebecauseth
eyaregenerateddynamicallybytheserverinresponsetoeachrequest.
WhenabrowsercacheorproxywebcachereceivesaGETrequesttherearethreepossibilities
:therequestedobjectisunreachable.thereisacachemissortheobjectisroundinthecache.Int
hefirsttwocasestherequestisforwardedtothenextleveltowardstheoriginwebserver.Whe
ntherequestedobjectisfoundinacache,thecachedcopymustbetestedforfreshness.
Squirrel:TheSquirrelwebcachingserviceperformsthesamefunctionsusingasmallpartof
theresourcesofeachclientcomputeronalocalnetwork.TheSHA-
IsecurehashfunctionisappliedtotheURLofeachcachedobjecttoproducea128-
bitPastryGUID.InthesimplestimplementationofSquirrel-
whichprovedtobethemosteffectiveone-
thenodewhoseGUIDisnumericallyclosesttotheGUIDofanobjectbecomesthatobject'sho
menode,responsibleforholdinganycachedcopyoftheobject.
ClientnodesareconfiguredtoincludealocalSquirrelproxyprocesswhichtakesresponsibili
tyforbothlocalandremotecachingofwebobjects.Ifafreshcopyofarequiredobjectisnotinth
elocalcacheSquirrelroutesaGetrequest.Ifthehomenodehasafreshcopyitdirectlyrespond
stotheclientwithanot-modifiedmessageorafreshcopy,asappropriate.
EvaluationofSquirrel:TheevaluationcomparedtheperformanceofaSquirrelwebcache
withacentralizedoneinthreerespects:
1) Thereductionintotalexternalbandwidthused:Thetotalexternalbandwidthisin
verselyrelatedtothehitratio,sinceitisonlycachemissesthatgeneraterequeststoexte
rnalwebservers.
2) Thelatencyperceivedbyusersforaccesstowebobjects:Theuseofaroutingoverla
yresultsinseveralmessagetransfers(routinghops)acrossthelocalnetworktotransm
itarequestfromaclienttothehostresponsibleforcachingtherelevantobject(thehom
enode).
Page53of88
3) Thecomputationalandstorageloadimposedonclientmodules:Theaveragenu
mberofcacherequestsservedforothernodesbyeachnodeoverthewholeperiodofthe
evaluationwasextremelylow.
OceanStorefilestore:ThedevelopersofTapestryhavedesignedandbuiltaprototypeforap
eer-to-
peerfilestore.TheOceanStoredesignaimstoprovideaverylargescaleincrementally-
scalablepersistentstoragefacilityformutabledataobjectswithlong-
termpersistenceandreliabilityinanenvironmentofconstantlychangingnetworkandcomp
utingresources.Thedesignincludesprovisionforthereplicatedstorageofbothmutableandi
mmutabledataobjects.
ThreetypesofGUIDsareusedassummarizedinaboveFigure.ThefirsttwoareGUIDsorthet
ypenormallyassignedtoobjectsstoredinTapestry,theyarecomputedfromthecontentsorth
erelevantblockusingasecurehashfunctionsothattheycanbeusedlatertoauthenticateandve
rifytheintegrityofthecontents.ThethirdtypeoridentifierusedisAGUIDs.Theserefer(indir
ectly)totheentirestreamofversionsofanobjectenablingclientstoaccessthecurrentversion
oftheobjectoranypreviousversion.
TIMEANDGLOBALSTATES
INTRODUCTION
Timeisanimportantandinterestingissueindistributedsystemsforseveralreasons.Firsttim
eisaquantityweoftenwanttomeasureaccurately.Inordertoknowatwhattimeordayapartic
ulareventoccurredataparticularcomputeritisnecessarytosynchronizeitsclockwithanaut
horitativeexternal source of time. For example, an ‘e-commerce’ transaction involves
eventsatamerchant'scomputerandatabankcomputer.Itisimportantforauditingpurposest
hatthoseeventsarctimestampedaccurately.
Page54of88
CLOCKS,EVENTSANDPROCESSSTATES
Clocks:Wehaveseenhowtoordertheeventsataprocessbutnothowtotimestampthem-
toassigntothemadateandtimeofday.Computerseachcontaintheirownphysicalclock.The
seclocksareelectronicdevicesthatcountoscillationsoccurringinacrystalatadefinitefrequ
encyandthattypicallydividethiscountandstoretheresultinacounterregister.
Clockskewandclockdrift:Computerclockslikeanyotherstendnottobeinperfectagreem
entasshowninbelowfigure.Theinstantaneousdifferencebetweenthereadingsofanytwocl
ocksiscalledtheirskew.Alsothecrystal-
basedclocksusedincomputersarelikeanyotherclocks,subjecttoclockdriftwhichmeansth
attheycounttimeatdifferentrates,andsodiverge.Theunderlyingoscillatorsaresubjecttoph
ysicalvariationswiththeconsequencethattheirfrequenciesofoscillationdiffer.Moreover,
eventhesameclock'sfrequencyvarieswithtemperature.Designsexistthatattempttocompe
nsateforthisvariation,buttheycannoteliminateit.Thedifferenceintheoscillationperiodbet
weentwoclocksmightbeextremelysmall,butthedifferenceaccumulatedovermanyoscilla
tionsleads10anobservabledifferenceinthecountersregisteredbytwoclocks,nomatterhow
accuratelytheywereinitializedtothesamevalue.
CoordinatedUniversalTime:Computerclockscanbesynchronizedtoexternalsourcesof
highlyaccuratetime.CoordinatedUniversalTime-
abbreviatedasUTC(fromtheFrenchequivalent)-
isaninternationalstandardfortimekeeping.Itisbasedonatomictime,butaso-
calledleapsecondisinserted-or,morerarelydeleted-
occasionallytokeepinstepwithastronomicaltime.UTCsignalsaresynchronizedandbroad
castregularlyfromlandbasedradiostationsandsatellitescoveringmanypartsoftheworld.
Page55of88
Forexample,intheUSA,theradiostationWWVbroadcaststimesignalsonseveralshortwav
efrequencies.SatellitesourcesincludetheGlobalPositioningSystem(GPS).
SYNCHRONIZINGPHYSICALCLOCKS
Inordertoknowatwhattimeofdayeventsoccurattheprocessesinourdistributedsystemfore
xample.Foraccountancypurposes-
itisnecessarytosynchronizetheprocesses'clocksCiwithanauthoritativeexternalsourceoft
ime.Thisisexternalsynchronization.AndiftheclocksCiarcsynchronizedwithoneanothert
oaknowndegreeofaccuracythenwecanmeasuretheintervalbetweentwoeventsoccurring
atdifferentcomputersbyappealingtotheirlocalclocks-
eventhoughtheyarenotnecessarilysynchronizedtoanexternalsourceoftime.Thisisintern
alsynchronization.
Cristian'smethodforsynchronizingclocks:Cristiansuggestedtheuseofatimeserver,co
nnectedtoadevicethatreceivessignalsfromasourceofUTC,tosynchronizecomputersexte
rnally.Uponrequest,theserverprocessSsuppliesthelimeaccordingtoitsclock,asshowninb
elowFigure.
Cristianobservedthatwhilethereisnoupperboundonmessagetransmissiondelaysinanasy
nchronoussystem.Theround-
triptimesformessagesexchangedbetweenpairsofprocessesareoftenreasonablyshort-
asmallfractionofasecond.Hedescribesthealgorithmasprobabilistic:themethodachievess
ynchronizationonlyiftheobservedround-
triptimesbetweenclientandserveraresufficientlyshortcomparedwiththerequiredaccurac
y.
Aprocessprequeststhetimeinamessagemr,andreceivesthetimevaluet,inamessagemt,
(tisinsertedinmt,atthelastpossiblepointbeforetransmissionfromS'scomputer).Processpr
ecordsthetotalround-
triptimeTroundtakentosendtherequestmrandreceivethereplymt.Itcanmeasurethistime
withreasonableaccuracyit'itsrateofclockdriftissmall.
Cristianmethodsuffersfromtheproblemassociatedwithallservicesimplementedbyasingl
eserver,thatthesingletimeservermightfailandthusrendersynchronizationimpossibletem
porarily.Cristiansuggestedforthis
Page56of88
reason,thattimeshouldbeprovidedbyagroupofsynchronizedtimeservers,eachwitharecei
verforUTCtimesignals.Forexample,aclientcouldmulticastitsrequesttoallserversanduse
sonlythefirstreplyobtained.
TheBerkeleyalgorithm:Itisaninternalsynchronizationmethodinwhichacoordinatorco
mputerischosentoactasthemaster.UnlikeCristian'sprotocol,thiscomputerperiodicallyp
ollstheothercomputerswhoseclocksaretobesynchronized,calledslaves.Theslavessendb
acktheirclockvaluestoit.Themasterestimatestheirlocalclocktimesbyobservingtheround
-
triptimes(similarlytoCristian'stechnique)anditaveragesthevaluesobtained(includingits
ownclock'sreading).
Insteadofsendingtheupdatedcurrenttimebacktotheothercomputers–
whichwouldintroducefurtheruncertaintyduetothemessagetransmissiontime-
themastersendstheamountbywhicheachindividualslave'sclockrequiresadjustment.This
canbeapositiveornegativevalue.
Thealgorithmeliminatesreadingsfromfaultyclocks.Suchclockscouldhaveasignificanta
dverseeffectifanordinaryaveragewastaken.Themastertakesafaulttolerantaverage.Thati
sasubsetorclocksischosenthatdonotdifferfromoneanotherbymorethanaspecifiedamoun
tandtheaverageistakenofreadingsfromonlytheseclocks.
TheNetworkTimeProtocol:
Cristian’s
methodandtheBerkeleyalgorithmareintendedprimarilyforusewithinintranets.
TheNetworkTimeProtocol(NTP)definesanarchitectureforatimeserviceandaprot
ocoltodistributetimeinformationovertheInternet.
LOGICALTIMEANDLOGICALCLOCKS
Fromthepointofviewofanysingleprocesseventsareordereduniquelybytimesshownonthe
localclock.However,asLamportpointedoutsincewecannotsynchronizeclocksperfectlya
crossadistributedsystem,wecannotingeneralusephysicaltimetofindouttheorderofanyar
bitrarypairofeventsoccurringwithinit.
Page57of88
Ingeneral,wecalluseaschemethatissimilartophysicalcausality,butthatappliesindistribut
edsystems,toordersomeoftheeventsthatoccuratdifferentprocesses.Thisorderingisbased
ontwosimpleandintuitivelyobviouspoints:
Lamportcalledthepartialorderingobtainedbygeneralizingthesetworelationshipsthehapp
ened-beforerelation.
GLOBALSTATES
Inthisandthenextsectionweshallexaminetheproblemoffindingoutwhetheraparticularpr
opertyistrueofadistributedsystemasitexecutes.Webeginbygivingtheexamplesofdistribu
tedgarbagecollection,deadlockdetection,terminationdetectionanddebugging.
Page58of88
The'snapshot'algorithmofChandyandLamport:
Page59of88
DISTRIBUTEDDEBUGGING
Wenowexaminetheproblemofrecordingasystem'sglobalstatesothatwemaymakeusefuls
tatementsaboutwhetheratransitorystate-asopposedtoastablestate-
occurredinanactualexecution.Thisiswhatwerequire,ingeneral,whendebuggingadistribu
tedsystem.
Page60of88
Observingconsistentglobalstates:
COORDINATIONANDAGREEMENT
INTRODUCTION
Failureassumptionsandfailuredetectors:
Page61of88
DISTRIBUTEDMUTUALEXCLUSION
Page62of88
Algorithmsformutualexclusion:
Page63of88
ELECTIONS
Page64of88
MULTICASTCOMMUNICATION
Page65of88
Basicmulticast:
Reliablemulticast:
Page66of88
Orderedmulticast:
CONSENSUSANDRELATEDPROBLEMS
Page67of88
*****
Page68of88
UNIT–IV
TRANSACTIONSANDCONCURRENCYCONTROLINTRODUC
TION
Page69of88
TRANSACTIONS
NESTEDTRANSACTIONS
Page70of88
LOCKS
Page71of88
Deadlocks:
Page72of88
Page73of88
OPTIMISTICCONCURRENCYCONTROL
Page74of88
TIMESTAMPORDERING
Page75of88
*****
Page76of88
UNIT–
VREPLICATI
ON
INTRODUCTION
Inthischapter,westudythereplicationofdata:themaintenanceofcopiesofdataatmultipleco
mputers.Replicationisakeytotheeffectivenessofdistributedsystemsinthatitcanprovidee
nhancedperformance,highavailabilityandfaulttolerance.Replicationisusedwidely.Fore
xample,thecachingofresourcesfromwebserversinbrowsersandwebproxyserversisaform
ofreplication,sincethedataheldincachesandatserversarereplicasofoneanother.TheDNS
namingservice,maintainscopiesofname-to-
attributemappingsforcomputersandisreliedonforday-to-
dayaccesstoservicesacrosstheInternet.
Replicationisatechniqueforenhancingservices.Themotivationsforreplicationinclude:
1) Performanceenhancement:Thecachingofdataatclientsandserversisbynowfam
iliarasameansofperformanceenhancement.Forexample,browsersandproxyserve
rscachecopiesofwebresourcestoavoidthelatencyoffetchingresourcesfromtheori
ginatingserver.Furthermore,dataaresometimesreplicatedtransparentlybetweens
everaloriginatingserversinthesamedomain.Theworkloadissharedbetween the
servers by binding all the server IP addresses to the
site’sDNSname,saywww.aWebSite.org.ADNSlookupofwww.aWebSite.orgres
ults in one of the several servers’ IP addresses being returned, in a
roundrobinfashionMoresophisticatedload-
balancingstrategiesarerequiredformorecomplexservicesbasedondatareplicatedb
etweenthousandsofservers.
2) Increasedavailability:Usersrequireservicestobehighlyavailable.Thatis,thepro
portionoftimeforwhichaserviceisaccessiblewithreasonableresponsetimesshould
becloseto100%.Apartfromdelaysduetopessimisticconcurrencycontrolconflicts(
duetodatalocking),thefactorsthatarerelevanttohighavailabilityare:
Page77of88
serverfailures;
networkpartitionsanddisconnectedoperation(communicationdisconnectio
nsthatareoftenunplannedandareasideeffectofusermobility).
Totakethefirstofthese,replicationisatechniqueforautomaticallymaintainingtheavailabil
ityofdatadespiteserverfailures.Ifdataarereplicatedattwoormorefailure-
independentservers,thenclientsoftwaremaybeabletoaccessdataatanalternativeserversh
ouldthedefaultserverfailorbecomeunreachable.Thatis,thepercentageoftimeduringwhic
htheserviceisavailablecanbeenhancedbyreplicatingserverdata.Ifeachofnservershasani
ndependentprobabilitypoffailingorbecomingunreachable,thentheavailabilityofanobjec
tstoredateachoftheseserversis:
Page78of88
SYSTEMMODEL&GROUPCOMMUNICATION
Thedatainoursystemconsistofacollectionofitemsthatweshallcallobjects.An‘object’
could be a file, say, or a Java object. But each such logical
objectisimplementedbyacollectionofphysicalcopiescalledreplicas.Thereplicasarephysi
calobjects,eachstoredatasinglecomputer,withdataandbehaviourthataretiedtosomedegr
eeofconsistencybythesystem’s operation. The ‘replicas’ of a given object are not
necessarily identical, at least notatanyparticularpointintime.
Somereplicasmayhavereceivedupdatesthatothershavenotreceived.Inthissection,wepro
videageneralsystemmodelformanagingreplicasandthendescribetheroleofgroupcommu
nicationsystemsinachievingfaulttolerancethroughreplication,highlightingtheimportan
ceofview-synchronousgroupcommunication.
Systemmodel:Weassumeanasynchronoussysteminwhichprocessesmayfailonlybycras
hing.Ourdefaultassumptionisthatnetworkpartitionsmaynotoccur,butweshallsometimes
considerwhathappensiftheydooccur.Networkpartitionsmakeithardertobuildfailuredete
ctors,whichweusetoachievereliableandtotallyorderedmulticast.
Forthesakeofgenerality,wedescribearchitecturalcomponentsbytheirrolesanddonotmea
ntoimplythattheyarenecessarilyimplementedbydistinctprocesses(orhardware).Themo
delinvolvesreplicasheldbydistinctreplicamanagers(seeFigure18.1),whicharecompone
ntsthatcontainthereplicasonagivencomputerandperformoperationsuponthemdirectly.T
hisgeneralmodelmaybeappliedinaclient-
serverenvironment,inwhichcaseareplicamanagerisaserver.
Page79of88
Weshallsometimessimplycallthemserversinstead.Equally,itmaybeappliedtoanapplicat
ionandapplicationprocessescaninthatcaseactasbothclients andreplica managers.
Forexample, the user’s laptopon a train
maycontainanapplicationthatactsasareplicamanagerfortheirdiary.
Weshallalwaysrequirethatareplicamanagerappliesoperationstoitsreplicasrecoverably.
Thisallowsustoassumethatanoperationatareplicamanagerdoesnotleaveinconsistentresu
ltsifitfailspartwaythrough.Suchareplicamanagerappliesoperationstoitsreplicasatomica
lly(indivisibly),sothatitsexecutionisequivalenttoperformingoperationsinsomestrictseq
uence.
Moreover,thestateofitsreplicasisadeterministicfunctionoftheirinitialstatesandtheseque
nceofoperationsthatitappliestothem.Otherstimuli,suchasthereadingonaclockoranattac
hedsensor,havenobearingonthesestatevalues.Withoutthisassumption,consistencyguar
anteesbetweenreplicamanagersthatacceptupdateoperationsindependentlycouldnotbem
ade.Thesystemcanonlydeterminewhichoperationstoapplyatallreplicamanagersandinw
hatorder– itcannotreproducenon-
deterministiceffects.Theassumptionimpliesthatitmaynotbepossible,dependinguponthe
threadingarchitecture,fortheserverstobemulti-threaded.
Ofteneachreplicamanagermaintainsareplicaofeveryobject,andweassumethisissounless
westateotherwise.However,thereplicasofdifferentobjectsmaybemaintainedbydifferent
setsofreplicamanagers.Forexample,oneobjectmaybeneededmostlybyclientsononenet
workandanotherbyclientsonanothernetwork.Thereislittletobegainedbyreplicatingthem
atmanagersontheothernetwork.
Ingeneral,fivephasesareinvolvedintheperformanceofasinglerequestuponthereplicated
objects[Wiesmannetal.2000].Theactionsineachphasevaryaccordingtothetypeofsystem
,aswillbecomeclearinthenexttwosections.Forexample,aservicethatsupportsdisconnect
edoperationbehavesdifferentlyfromonethatprovidesafault-
tolerantservice.Thephasesareasfollows:
Page80of88
1) Request:Thefrontendissuestherequesttooneormorereplicamanagers:–
eitherthefrontendcommunicateswithasinglereplicamanager,whichinturncomm
unicateswithotherreplicamanagers;–
orthefrontendmulticaststherequesttothereplicamanagers.
2) Coordination:Thereplicamanagerscoordinateinpreparationforexecutingthereq
uestconsistently.Theyagree,ifnecessaryatthisstage,onwhethertherequestistobea
pplied(itmightnotbeappliedatalliffailuresoccuratthisstage).Theyalsodecideonth
eorderingofthisrequestrelativetoothers.
i. FIFOordering:Ifafrontendissuesrequestrandthenrequestrc,anycorrectre
plicamanagerthathandlesrchandlesrbeforeit.
ii. Causalordering:Iftheissueofrequestrhappened-
beforetheissueofrequestrc,thenanycorrectreplicamanagerthathandlesrch
andlesrbeforeit.
iii. Totalordering:Ifacorrectreplicamanagerhandlesrbeforerequestrc,thena
nycorrectreplicamanagerthathandlesrchandlesrbeforeit.
FAULTTOLERANTSERVICES
Inthissection,weexaminehowtoprovideaservicethatiscorrectdespiteuptofprocessfailure
s,byreplicatingdataandfunctionalityatreplicamanagers.Forthesakeofsimplicity,weassu
methatcommunicationremainsreliableandthatnopartitionsoccur.Eachreplicamanageris
assumedtobehaveaccordingtoaspecificationofthesemanticsoftheobjectsitmanages,whe
ntheyhavenotcrashed.Forexample,aspecificationofbankaccountswouldincludeanassur
ancethatfundstransferredbetweenbankaccountscanneverdisappear,andthatonlydeposit
sandwithdrawalsaffectthebalanceofanyparticularaccount.
Intuitively,aservicebasedonreplicationiscorrectifitkeepsrespondingdespitefailuresandi
fclientscannottellthedifferencebetweentheservicetheyobtainfromanimplementationwit
hreplicateddataandoneprovidedbyasinglecorrectreplicamanager.Careisneededinmeeti
ngthesecriteria.If
Page81of88
precautionsarenottaken,thenanomaliescanarisewhenthereareseveralreplicamanagers–
evenbearinginmindthatweareconsideringtheeffectsofindividualoperations,nottransacti
ons.
Consideranaivereplicationsystem,inwhichapairofreplicamanagersatcomputersAandB
eachmaintainreplicasoftwobankaccounts,xandy.Clientsreadandupdatetheaccountsatth
eirlocalreplicamanagerbuttryanotherreplicamanagerifthelocalonefails.
Replicamanagerspropagateupdatestooneanotherinthebackgroundafterrespondingtothe
clients.Bothaccountsinitiallyhaveabalanceof$0.
Client1updatesthebalanceofxatitslocalreplicamanagerBtobe$1andthen attempts to
update y’s balance to be $2, but discovers that B has failed.
Client1thereforeappliestheupdateatAinstead.Nowclient2readsthebalancesatitslocalrep
licamanagerA.Itfindsfirstthatyhas$2andthenthatxhas$0–
theupdatetobankaccountxfromBhasnotarrived,sinceBfailed.Thesituationisshownbelo
w,wheretheoperationsarelabelledbythecomputeratwhichtheyfirsttookplaceandlowero
perationshappenlater:
Thisexecutiondoesnotmatchacommon-
sensespecificationforthebehaviourofbankaccounts:client2shouldhavereadabalanceof$
1forx,given that it read the balance of $2 for y, since y’s balance was updated
afterthatofx.Theanomalousbehaviourinthereplicatedcasecouldnothaveoccurrediftheba
nkaccountshadbeenimplementedbyasingleserver.Wecanconstructsystemsthatmanager
eplicatedobjectswithouttheanomalousbehaviourproducedbythenaiveprotocolinourexa
mple.First,weneedtounderstandwhatcountsascorrectbehaviourforareplicatedsystem.
Page82of88
Areplicatedsharedobjectserviceissaidtobelinearizableifforanyexecutionthereissomeint
erleavingoftheseriesofoperationsissuedbyalltheclientsthatsatisfiesthefollowingtwocrit
eria:
1) Theinterleavedsequenceofoperationsmeetsthespecificationofa(single)correctco
pyoftheobjects.
2) Theorderofoperationsintheinterleavingisconsistentwiththerealtimesatwhichthe
operationsoccurredintheactualexecution.
Thisdefinitioncapturestheideathatforanysetofclientoperationsthereisavirtualcanonical
execution– theinterleavedoperationsthatthedefinitionrefersto–
againstavirtualsingleimageofthesharedobjects.Andeachclientseesaviewofthesharedob
jectsthatisconsistentwiththatsingleimage:thatis,theresults oftheclient’s operations
makesenseas theyoccurwithin theinterleaving.
Theservicethatgaverisetotheexecutionofthebankaccountclientsintheprecedingexample
isnotlinearizable.Evenignoringtherealtimeatwhichthe operations took place, there is
no interleaving of the two clients’ operations
thatwouldsatisfyanycorrectbankaccountspecification:forauditingpurposes,ifoneaccou
ntupdateoccurredafteranother,thenthefirstupdateshouldbeobservedifthesecondhasbee
nobserved.
Notethatlinearizabilityconcernsonlytheinterleavingofindividualoperationsandisnotint
endedtobetransactional.Alinearizableexecutionmaybreak
Page83of88
application-specificnotionsofconsistencyifconcurrencycontrolisnotapplied.
Thereal-
timerequirementinlinearizabilityisdesirableinanidealworld,becauseitcapturesournotio
nthatclientsshouldreceiveup-to-
dateinformation.But,equally,thepresenceofrealtimeinthedefinitionraisestheissueofline
arizability’s
practicality,becausewecannotalwayssynchronizeclockstotherequireddegreeofaccurac
y.Aweakercorrectnessconditionissequentialconsistency,whichcapturesanessentialrequ
irementconcerningtheorderinwhichrequestsareprocessedwithoutappealingtorealtime.
Thedefinitionkeepsthefirstcriterionfromthedefinitionforlinearizabilitybutmodifiesthes
econd.
Areplicatedsharedobjectserviceissaidtobesequentiallyconsistentifforanyexecutionther
eissomeinterleavingoftheseriesofoperationsissuedbyalltheclientsthatsatisfiesthefollow
ingtwocriteria:
1) Theinterleavedsequenceofoperationsmeetsthespecificationofa(single)correctco
pyoftheobjects.
2) Theorderofoperationsintheinterleavingisconsistentwiththeprogramorderinwhic
heachindividualclientexecutedthem.
Notethatabsolutetimedoesnotappearinthisdefinition.Nordoesanyothertotalorderonallo
perations.Theonlynotionoforderingthatisrelevantistheorderofeventsateachseparateclie
nt–
theprogramorder.Theinterleavingofoperationscanshufflethesequenceofoperationsfro
masetofclientsinany order, as long as each client’s order is not violated and the result
of each operationisconsistent,intermsoftheobjects’
specification,withtheoperationsthatprecededit.Thisissimilartoshufflingtogetherseveral
packsofcardssothattheyareintermingledinsuchawayastopreservetheoriginalorderofeac
hpack.
Passive(primary-backup)replication:Inthepassiveorprimary-
backupmodelofreplicationforfaulttolerance(Figure18.3),thereisatanyonetimeasinglepr
imaryreplicamanagerandoneormoresecondaryreplicamanagers–
‘backups’or‘slaves’.Inthepureformofthemodel,frontendscommunicate
Page84of88
onlywiththeprimaryreplicamanagertoobtaintheservice.Theprimaryreplicamanagerexe
cutestheoperationsandsendscopiesoftheupdateddatatothebackups.Iftheprimaryfails,on
eofthebackupsispromotedtoactastheprimary.
Thesequenceofeventswhenaclientrequestsanoperationtobeperformedisasfollows:
1) Request:Thefrontendissuestherequest,containingauniqueidentifier,totheprimar
yreplicamanager.
2) Coordination:Theprimarytakeseachrequestatomically,intheorderinwhichitrec
eivesit.Itcheckstheuniqueidentifier,incaseithasalreadyexecutedtherequest,andif
soitsimplyresendstheresponse.
3) Execution:Theprimaryexecutestherequestandstorestheresponse.
4) Agreement:Iftherequestisanupdate,thentheprimarysendstheupdatedstate,there
sponseandtheuniqueidentifiertoallthebackups.Thebackupssendanacknowledge
ment.
5) Response:Theprimaryrespondstothefrontend,whichhandstheresponsebacktoth
eclient.
Thissystemobviouslyimplementslinearizabilityiftheprimaryiscorrect,sincetheprimarys
equencesalltheoperationsuponthesharedobjects.Iftheprimaryfails,thenthesystemretain
slinearizabilityifasinglebackupbecomesthenewprimaryandifthenewsystemconfigurati
ontakesoverexactlywherethelastleftoff.Thatisif:
Page85of88
Theprimaryisreplacedbyauniquebackup(iftwoclientsbeganusingtwobackups,the
nthesystemcouldperformincorrectly).
Thereplicamanagersthatsurviveagreeonwhichoperationshadbeenperformedatthe
pointwhenthereplacementprimarytakesover.
Bothoftheserequirementsaremetifthereplicamanagers(primaryandbackups)areorganiz
edasagroupandiftheprimaryusesview-
synchronousgroupcommunicationtosendtheupdatestothebackups.Thefirstoftheabovet
worequirementsistheneasilysatisfied.Whentheprimarycrashes,thecommunicationsyste
meventuallydeliversanewviewtothesurvivingbackups,onethatexcludestheoldprimary.
Thebackupthatreplacestheprimarycanbechosenbyanyfunctionofthatview.Forexample,
thebackupscanchoosethefirstmemberinthatviewasthereplacement.Thatbackupcanregi
steritselfastheprimarywithanameservicethattheclientsconsultwhentheysuspectthatthep
rimaryhasfailed(orwhentheyrequiretheserviceinthefirstplace).
Thesecondrequirementisalsosatisfied,bytheorderingpropertyofview-
synchronyandtheuseofstoredidentifierstodetectrepeatedrequests.Theview-
synchronoussemanticsguaranteethateitherallthebackupsornoneofthemwilldeliveranyg
ivenupdatebeforedeliveringthenewview.Thusthenewprimaryandthesurvivingbackups
allagreeonwhetheranyparticularclient’s updatehasorhasnotbeenprocessed.
Activereplication:Intheactivemodelofreplicationforfaulttolerance(seeFigure18.4),the
replicamanagersarestatemachinesthatplayequivalentrolesandareorganizedasagroup.Fr
ontendsmulticasttheirrequeststothegroupofreplicamanagersandallthereplicamanagers
processtherequestindependentlybutidenticallyandreply.Ifanyreplicamanagercrashes,t
hisneedhavenoimpactupontheperformanceoftheservice,sincetheremainingreplicaman
agerscontinuetorespondinthenormalway.WeshallseethatactivereplicationcantolerateB
yzantinefailures,becausethefrontendcancollectandcomparetherepliesitreceives.
Underactivereplication,thesequenceofeventswhenaclientrequestsanoperationtobeperf
ormedisasfollows:
Page86of88
1) Request:Thefrontendattachesauniqueidentifiertotherequestandmulticastsittoth
egroupofreplicamanagers,usingatotallyordered,reliablemulticastprimitive.Thef
rontendisassumedtofailbycrashingatworst.Itdoesnotissuethenextrequestuntilith
asreceivedaresponse.
2) Coordination:Thegroupcommunicationsystemdeliverstherequesttoeverycorre
ctreplicamanagerinthesame(total)order.
3) Execution:Everyreplicamanagerexecutestherequest.Sincetheyarestatemachine
sandsincerequestsaredeliveredinthesametotalorder,correctreplicamanagersallpr
ocesstherequestidentically.Theresponsecontainstheclient’suniquerequestidenti
fier.
4) Agreement:Noagreementphaseisneeded,becauseofthemulticastdeliverysemant
ics.
5) Response:Eachreplicamanagersendsitsresponsetothefrontend.Thenumberofre
pliesthatthefrontendcollectsdependsuponthefailureassumptionsandthemulticast
algorithm.If,forexample,thegoalistotolerateonlycrashfailuresandthemulticastsa
tisfiesuniformagreementandorderingproperties,thenthefrontendpassesthefirstre
sponsetoarrivebacktotheclientanddiscardstherest(itcandistinguishthesefromres
ponsestootherrequestsbyexaminingtheidentifierintheresponse).
Thissystemachievessequentialconsistency.Allcorrectreplicamanagersprocessthesame
sequenceofrequests.Thereliabilityofthemulticastensuresthateverycorrectreplicamanag
erprocessesthesamesetofrequests
Page87of88
andthetotalorderensuresthattheyprocesstheminthesameorder.Sincetheyarestatemachin
es,theyallendupwiththesamestateasoneanotheraftereachrequest.Eachfrontend’sreques
tsareservedinFIFOorder(becausethefrontendawaitsaresponsebeforemakingthenextreq
uest),whichisthesameas‘programorder’.Thisensuressequentialconsistency.
Ifclientsdonotcommunicatewithotherclientswhilewaitingforresponsestotheirrequests,t
hentheirrequestsareprocessedinhappened-
beforeorder.Ifclientsaremultithreadedandcancommunicatewithoneanotherwhileawaiti
ngresponsesfromtheservice,thentoguaranteerequestprocessinginhappened-
beforeorderwewouldhavetoreplacethemulticastwithonethatisbothcausallyandtotallyor
dered.
Theactivereplicationsystemdoesnotachievelinearizability.Thisisbecausethetotalorderi
nwhichthereplicamanagersprocessrequestsisnotnecessarilythesameasthereal-
timeorderinwhichtheclientsmadetheirrequests.Schneider[1990]describeshow,inasync
hronoussystemwithapproximatelysynchronizedclocks,thetotalorderinwhichthereplica
managersprocessrequestscanbebasedontheorderofphysicaltimestampsthatthefrontend
ssupplywiththeirrequests.Thisdoesnotguaranteelinearizability,becausethetimestampsa
renotperfectlyaccurate;butitapproximatesit.
*****
Page88of88