Microsoft SQL Server AlwaysOn Solutions Guide For High Availability and Disaster Recovery
Microsoft SQL Server AlwaysOn Solutions Guide For High Availability and Disaster Recovery
Sol
Ava
LeRoy
Contrib
Mishra
Review
(SQLHA
Matthe
Thoma
Summa
maximiz
AlwaysO
A key go
between
infrastru
Categor
Applies
Source:
E-book
32 page
croso
ution
ailab
y Tuttle,
butors: Li
wers: Kevi
A), Alexei
ews, AyadS
s, Benjam
ry: This wh
ze applicatio
On high ava
oal of this p
n business s
ucture engin
ry: Quick G
to: SQL Se
White pap
publicatio
s
oft SQ
ns Gu
ility a
, Jr.
indsey All
n Farlee, S
Khalyako,
Shammou
min Wright
ite paper d
on availabil
ilability and
paper is to e
stakeholder
neers, and d
uide
erver 2012
er (link to s
on date: Ma
QL Se
uide
and
en, Justin
Shahryar G
, Wolfgan
ut (Caregr
t-Jones
iscusses ho
ity, and pro
d disaster re
establish a
rs, technica
database ad
source cont
ay 2012
erver
for H
Disas
Erickson,
G. Hashem
g Kutsche
roup), Dav
ow to reduc
ovide data p
ecovery sol
common co
l decision m
dministrato
ent)
r Alw
High
ster
Min He, C
mi (Motric
era (Bwin
vid P. Smit
ce planned
protection
utions.
ontext for r
makers, syst
ors.
waysO
Reco
Cephas Li
city), Allan
Party), Ch
th (Service
and unplan
using SQL S
related disc
tem archite
On
overy
n, Sanjay
n Hirt
harles
eU), Juerg
nned downt
Server 2012
ussions
ects,
y
gen
time,
2
This page intentionally left blank
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery iv
Contents
HighAvailabilityandDisasterRecoveryConcepts.........................................................................1
DescribingHighAvailability................................................................................................................................................1
Plannedvs.UnplannedDowntime..........................................................................................................................................1
DegradedAvailability..............................................................................................................................................................2
QuantifyingDowntime.........................................................................................................................................................2
RecoveryObjectives................................................................................................................................................................3
JustifyingROIorOpportunityCost..........................................................................................................................................3
MonitoringAvailabilityHealth................................................................................................................................................4
PlanningforDisasterRecovery...............................................................................................................................................4
Overview:HighAvailabilitywithMicrosoftSQLServer2012..................................................................................5
SQLServerAlwaysOn..............................................................................................................................................................5
SignificantlyReducePlannedDowntime.................................................................................................................................5
EliminateIdleHardwareandImproveCostEfficiencyandPerformance................................................................................6
EasyDeploymentandManagement.......................................................................................................................................6
ContrastingRPOandRTOCapabilities....................................................................................................................................6
SQLServerAlwaysOnLayersofProtection..........................................................................................7
InfrastructureAvailability...................................................................................................................................................8
WindowsOperatingSystem....................................................................................................................................................8
WindowsServerFailoverClustering.......................................................................................................................................9
WSFCClusterValidationWizard...........................................................................................................................................11
WSFCQuorumModesandVotingConfiguration..................................................................................................................12
WSFCDisasterRecoverythroughForcedQuorum................................................................................................................15
SQLServerInstanceLevelProtection...........................................................................................................................17
AvailabilityImprovementsSQLServerInstances...............................................................................................................17
AlwaysOnFailoverClusterInstances.....................................................................................................................................18
DatabaseAvailability..........................................................................................................................................................21
AlwaysOnAvailabilityGroups...............................................................................................................................................21
AvailabilityGroupFailover....................................................................................................................................................22
AvailabilityGroupListener....................................................................................................................................................24
AvailabilityImprovementsDatabases................................................................................................................................26
ClientConnectivityRecommendations........................................................................................................................27
Conclusion..............................................................................................................................................................................28
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 1
HighAvailabilityandDisasterRecoveryConcepts
Youcanmakethebestselectionofadatabasetechnologyforahighavailabilityanddisasterrecovery
solutionwhenallstakeholdershaveasharedunderstandingoftherelatedbusinessdrivers,challenges,
andobjectivesofplanning,managing,andmeasuringRTOandRPOobjectives.
ReaderswhoarefamiliarwiththeseconceptscanmoveaheadtotheOverview:HighAvailabilitywith
MicrosoftSQLServer2012sectionofthispaper.
DescribingHighAvailability
Foragivensoftwareapplicationorservice,highavailabilityisultimatelymeasuredintermsofthe
endusersexperienceandexpectations.Thetangibleandperceivedbusinessimpactofdowntimemay
beexpressedintermsofinformationloss,propertydamage,decreasedproductivity,opportunitycosts,
contractualdamages,orthelossofgoodwill.
Theprincipalgoalofahighavailabilitysolutionistominimizeormitigatetheimpactofdowntime.A
soundstrategyforthisoptimallybalancesbusinessprocessesandServiceLevelAgreements(SLAs)with
technicalcapabilitiesandinfrastructurecosts.
Aplatformisconsideredhighlyavailablepertheagreementandexpectationsofcustomersand
stakeholders.Theavailabilityofasystemcanbeexpressedasthiscalculation:
Actuol uptimc
ExpcctcJ uptimc
1uu%
Theresultingvalueisoftenexpressedbyindustryintermsofthenumberof9sthatthesolution
provides;meanttoconveyanannualnumberofminutesofpossibleuptime,orconversely,minutesof
downtime.
Numberof9s AvailabilityPercentage TotalAnnualDowntime
2 99% 3days,15hours
3 99.9% 8hours,45minutes
4 99.99% 52minutes,34seconds
5 99.999% 5minutes,15seconds
Plannedvs.UnplannedDowntime
Systemoutagesareeitheranticipatedandplannedfor,ortheyaretheresultofanunplanned
failure.Downtimeneednotbeconsiderednegativelyifitisappropriatelymanaged.Therearetwokey
typesofforeseeabledowntime:
Plannedmaintenance.Atimewindowispreannouncedandcoordinatedforplannedmaintenance
taskssuchassoftwarepatching,hardwareupgrades,passwordupdates,offlinereindexing,data
loading,ortherehearsalofdisasterrecoveryprocedures.Deliberate,wellmanagedoperational
proceduresshouldminimizedowntimeandpreventanydataloss.Plannedmaintenanceactivities
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 2
canbeseenasinvestmentsneededtopreventormitigateotherpotentiallymoresevereunplanned
outagescenarios.
Unplannedoutage.Systemlevel,infrastructure,orprocessfailuresmayoccurthatareunplannedor
uncontrollable,orthatareforeseeable,butconsideredeithertoounlikelytooccur,orare
consideredtohaveanacceptableimpact.Arobusthighavailabilitysolutiondetectsthesetypesof
failures,automaticallyrecoversfromtheoutage,andthenreestablishesfaulttolerance.
WhenestablishingSLAsforhighavailability,youshouldcalculateseparatekeyperformance
indicators(KPIs)forplannedmaintenanceactivitiesandunplanneddowntime.Thisapproachallowsyou
tocontrastyourinvestmentinplannedmaintenanceactivitiesagainstthebenefitofavoidingunplanned
downtime.
DegradedAvailability
Highavailabilityshouldnotbeconsideredasanallornothingproposition.Asanalternativetoa
completeoutage,itisoftenacceptabletotheenduserforasystemtobepartiallyavailable,ortohave
limitedfunctionalityordegradedperformance.Thesevaryingdegreesofavailabilityinclude:
Readonlyanddeferredoperations.Duringamaintenancewindow,orduringaphaseddisaster
recovery,dataretrievalisstillpossible,butnewworkflowsandbackgroundprocessingmaybe
temporarilyhaltedorqueued.
Datalatencyandapplicationresponsiveness.Duetoaheavyworkload,aprocessingbacklog,ora
partialplatformfailure,limitedhardwareresourcesmaybeovercommittedorundersized.User
experiencemaysuffer,butworkmaystillgetdoneinalessproductivemanner.
Partial,transient,orimpendingfailures.Robustnessintheapplicationlogicorhardwarestackthat
retriesorselfcorrectsuponencounteringanerror.Thesetypesofissuesmayappeartotheenduser
asdatalatencyorpoorapplicationresponsiveness.
Partialendtoendfailure.Plannedorunplannedoutagesmayoccurgracefullywithinverticallayers
ofthesolutionstack(infrastructure,platform,andapplication),orhorizontallybetweendifferent
functionalcomponents.Usersmayexperiencepartialsuccessordegradation,dependinguponthe
featuresorcomponentsthatareaffected.
Theacceptabilityofthesesuboptimalscenariosshouldbeconsideredaspartofaspectrumofdegraded
availabilityleadinguptoacompleteoutage,andasintermediatestepsinaphaseddisasterrecovery.
QuantifyingDowntime
Whendowntimedoesoccur,eitherplanned,orunplanned,theprimarybusinessgoalistobringthe
systembackonlineandminimizedataloss.Everyminuteofdowntimehasdirectandindirectcosts.With
unplanneddowntime,youmustbalancethetimeandeffortneededtodeterminewhytheoutage
occurred,whatthecurrentsystemstateis,andwhatstepsareneededtorecoverfromtheoutage.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 3
Atapredeterminedpointinanyoutage,youshouldmakeorseekthebusinessdecisiontostop
investigatingtheoutageorperformingmaintenancetasks,recoverfromtheoutagebybringingthe
systembackonline,andifneeded,reestablishfaulttolerance.
RecoveryObjectives
Dataredundancyisakeycomponentofahighavailabilitydatabasesolution.Transactionalactivityon
yourprimarySQLServerinstanceissynchronouslyorasynchronouslyappliedtooneormoresecondary
instances.Whenanoutageoccurs,transactionsthatwereinflightmayberolledback,ortheymaybe
lostonthesecondaryinstancesduetodelaysindatapropagation.
Youcanbothmeasuretheimpact,andsetrecoverygoalsintermshowlongittakestogetbackin
business,andhowmuchtimelatencythereisinthelasttransactionrecovered:
RecoveryTimeObjective(RTO).Thisisthedurationoftheoutage.Theinitialgoalistogetthe
systembackonlineinatleastareadonlycapacitytofacilitateinvestigationofthefailure.However,
theprimarygoalistorestorefullservicetothepointthatnewtransactionscantakeplace.
RecoveryPointObjective(RPO).Thisisoftenreferredtoasameasureofacceptabledataloss.Itis
thetimegaporlatencybetweenthelastcommitteddatatransactionbeforethefailureandthe
mostrecentdatarecoveredafterthefailure.Theactualdatalosscanvarydependinguponthe
workloadonthesystematthetimeofthefailure,thetypeoffailure,andthetypeofhigh
availabilitysolutionused.
YoushoulduseRTOandRPOvaluesasgoalsthatindicatebusinesstolerancefordowntimeand
acceptabledataloss,andasmetricsformonitoringavailabilityhealth.
JustifyingROIorOpportunityCost
Thebusinesscostsofdowntimemaybeeitherfinancialorintheformofcustomergoodwill.Thesecosts
mayaccruewithtime,ortheymaybeincurredatacertainpointintheoutagewindow.Inadditionto
projectingthecostofincurringanoutagewithagivenrecoverytimeanddatarecoverypoint,youcan
alsocalculatethebusinessprocessandinfrastructureinvestmentsneededtoattainyourRTOandRPO
goalsortoavoidtheoutagealltogether.Theseinvestmentthemesshouldinclude:
Avoidingdowntime.Outagerecoverycostsareavoidedalltogetherifanoutagedoesntoccurinthe
firstplace.Investmentsincludethecostoffaulttolerantandredundanthardwareorinfrastructure,
distributingworkloadsacrossisolatedpointsoffailure,andplanneddowntimeforpreventive
maintenance.
Automatingrecovery.Ifasystemfailureoccurs,youcangreatlymitigatetheimpactofdowntimeon
thecustomerexperiencethroughautomaticandtransparentrecovery.
Resourceutilization.Secondaryorstandbyinfrastructurecansitidle,awaitinganoutage.Italsocan
beleveragedforreadonlyworkloads,ortoimproveoverallsystemperformancebydistributing
workloadsacrossallavailablehardware.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 4
ForgivenRTOandRPOgoals,theneededavailabilityandrecoveryinvestments,combinedwiththe
projectedcostsofdowntime,canbeexpressedandjustifiedasafunctionoftime.Duringanactual
outage,thisallowsyoutomakecostbaseddecisionsbasedontheelapseddowntime.
MonitoringAvailabilityHealth
Fromanoperationalpointofview,duringanactualoutage,youshouldnotattempttoconsiderall
relevantvariablesandcalculateROIoropportunitycostsinrealtime.Instead,youshouldmonitordata
latencyonyourstandbyinstancesasaproxyforexpectedRPO.
Intheeventofanoutage,youshouldalsolimittheinitialtimespentinvestigatingtherootcauseduring
theoutage,andinsteadfocusonvalidatingthehealthofyourrecoveryenvironment,andthenrelyupon
detailedsystemlogsandsecondarycopiesofdataforsubsequentforensicanalysis.
PlanningforDisasterRecovery
Whilehighavailabilityeffortsentailwhatyoudotopreventanoutage,disasterrecoveryeffortsaddress
whatisdonetoreestablishhighavailabilityaftertheoutage.
Asmuchaspossible,disasterrecoveryproceduresandresponsibilitiesshouldbeformulatedbeforean
actualoutageoccurs.Baseduponactivemonitoringandalerts,thedecisiontoinitiateanautomatedor
manualfailoverandrecoveryplanshouldbetiedtopreestablishedRTOandRPOthresholds.Thescope
ofasounddisasterrecoveryplanshouldinclude:
Granularityoffailureandrecovery.Dependinguponthelocationandtypeoffailure,youcantake
correctiveactionatdifferentlevels;thatis,datacenter,infrastructure,platform,application,or
workload.
Investigativesourcematerial.Baselineandrecentmonitoringhistory,systemalerts,eventlogs,and
diagnosticqueriesshouldallbereadilyaccessiblebyappropriateparties.
Coordinationofdependencies.Withintheapplicationstack,andacrossstakeholders,whatarethe
systemandbusinessdependencies?
Decisiontree.Apredetermined,repeatable,validateddecisiontreethatincludesrole
responsibilities,faulttriage,failovercriteriaintermsofRPOandRTOgoals,andprescribedrecovery
steps.
Validation.Aftertakingstepstorecoverfromtheoutage,whatmustbedonetoverifythatthe
systemhasreturnedtonormaloperations?
Documentation.Capturealloftheaboveitemsinasetofdocumentation,withsufficientdetailand
claritysothatathirdpartyteamcanexecutetherecoveryplanwithminimalassistance.Thistype
ofdocumentationiscommonlyreferredasarunbookoracookbook.
Recoveryrehearsals.Regularlyexercisethedisasterrecoveryplantoestablishbaselineexpectations
forRTOgoals,andconsiderregularrotationofhostingtheprimaryproductionsiteontheprimary
andeachofthedisasterrecoverysites.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 5
Overview:HighAvailabilitywithMicrosoftSQLServer2012
AchievingtherequiredRPOandRTOgoalsinvolvesensuringcontinuousuptimeofcriticalapplications
andprotectionofcriticaldatafromunplannedandplanneddowntime.SQLServerprovidesasetof
featuresandcapabilitiesthatcanhelpachievethosegoalswhilekeepingthecostandcomplexitylow.
ReaderswhohaveahighlevelfamiliaritywiththenewAlwaysOncapabilitiescanmoveaheadtothe
deepercoverageintheSQLServerAlwaysOnLayersofProtectionsectionofthispaper.
SQLServerAlwaysOn
AlwaysOnisanewintegrated,flexible,costefficienthighavailabilityanddisasterrecoverysolution.It
canprovidedataandhardwareredundancywithinandacrossdatacenters,andimprovesapplication
failovertimetoincreasetheavailabilityofyourmissioncriticalapplications.AlwaysOnprovidesflexibility
inconfigurationandenablesreuseofexistinghardwareinvestments.
AnAlwaysOnsolutioncanleveragetwomajorSQLServer2012featuresforconfiguringavailabilityat
boththedatabaseandtheinstancelevel:
AlwaysOnAvailabilityGroups,newinSQLServer2012,greatlyenhancethecapabilitiesofdatabase
mirroringandhelpsensureavailabilityofapplicationdatabases,andtheyenablezerodataloss
throughlogbaseddatamovementfordataprotectionwithoutshareddisks.
Availabilitygroupsprovideanintegratedsetofoptionsincludingautomaticandmanualfailoverofa
logicalgroupofdatabases,supportforuptofoursecondaryreplicas,fastapplicationfailover,and
automaticpagerepair.
AlwaysOnFailoverClusterInstances(FCIs)enhancetheSQLServerfailoverclusteringfeatureand
supportmultisiteclusteringacrosssubnets,whichenablescrossdatacenterfailoverofSQLServer
instances.Fasterandmorepredictableinstancefailoverisanotherkeybenefitthatenablesfaster
applicationrecovery.
SignificantlyReducePlannedDowntime
Thekeyreasonforapplicationdowntimeinanyorganizationisplanneddowntimecausedbyoperating
systempatching,hardwaremaintenance,andsoon.Thiscanconstitutealmost80percentofthe
outagesinanITenvironment.
SQLServer2012helpsreduceplanneddowntimesignificantlybyreducingpatchingrequirementsand
enablingmoreonlinemaintenanceoperations:
WindowsServerCore.SQLServer2012supportsdeploymentsonWindowsServerCore,aminimal,
streamlineddeploymentoptionforWindowsServer2008andWindowsServer2008R2.This
operatingsystemconfigurationcanreduceplanneddowntimebyminimizingoperatingsystem
patchingrequirementsbyasmuchas60percent.
OnlineOperations.EnhancedsupportforonlineoperationslikeLOBreindexingandaddingcolumns
withdefaultvalueshelpstoreducedowntimeduringdatabasemaintenanceoperations.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 6
RollingUpgradeandPatching.AlwaysOnfeaturesfacilitaterollingupgradesandpatchingof
instances,whichhelpssignificantlytoreduceapplicationdowntime.
SQLServeronHyperV.SQLServerinstanceshostedintheHyperVenvironmentreceivethe
additionalbenefitofLiveMigration,whichenablesyoutomigratevirtualmachinesbetweenhosts
withzerodowntime.Administratorscanperformmaintenanceoperationsonthehostwithout
impactingapplications.
EliminateIdleHardwareandImproveCostEfficiencyandPerformance
Typicalhighavailabilitysolutionsinvolvedeploymentofcostly,redundant,passiveservers.AlwaysOn
AvailabilityGroupsenableyoutoutilizesecondarydatabasereplicasonotherwisepassiveoridleservers
forreadonlyworkloadssuchasSQLServerReportingServicesreportqueriesorbackupoperations.The
abilitytosimultaneouslyutilizeboththeprimaryandsecondarydatabasereplicashelpsimprove
performanceofallworkloadsduetobetterresourcebalancingacrossyourserverhardware
investments.
EasyDeploymentandManagement
FeaturessuchastheConfigurationWizard,supportfortheWindowsPowerShellcommandline
interface,dashboards,dynamicmanagementviews(DMVs),policybasedmanagement,andSystem
Centerintegrationhelpsimplifydeploymentandmanagementofavailabilitygroups.
ContrastingRPOandRTOCapabilities
ThebusinessgoalsforRecoveryPointObjective(RPO)andRecoveryTimeObjective(RTO)shouldbekey
driversinselectingaSQLServertechnologyforyourhighavailabilityanddisasterrecoverysolution.
Thistableoffersaroughcomparisonofthetypeofresultsthatthosedifferentsolutionsmayachieve:
HighAvailabilityandDisasterRecovery
SQLServerSolution
Potential
DataLoss
(RPO)
Potential
Recovery
Time(RTO)
Automatic
Failover
Readable
Secondaries
(1)
AlwaysOnAvailabilityGroupsynchronouscommit
Seconds Minutes No 04
AlwaysOnFailoverClusterInstance
NA
(5)
Seconds
tominutes
Yes NA
DatabaseMirroring
(2)
Highsafety(sync+witness)
Seconds
(6)
Minutes
(6)
No NA
LogShipping
Minutes
(6)
Minutes
tohours
(6)
No Notduring
arestore
Backup,Copy,Restore
(3)
Hours
(6)
Hours
todays
(6)
No Notduring
arestore
(1)
AnAlwaysOnAvailabilityGroupcanhavenomorethanatotaloffoursecondaryreplicas,regardlessoftype.
(2)
ThisfeaturewillberemovedinafutureversionofMicrosoftSQLServer.UseAlwaysOnAvailabilityGroupsinstead.
(3)
Backup,Copy,Restoreisappropriatefordisasterrecovery,butnotforhighavailability.
(4)
Automaticfailoverofanavailabilitygroupisnotsupportedtoorfromafailoverclusterinstance.
(5)
TheFCIitselfdoesntprovidedataprotection;datalossisdependentuponthestoragesystemimplementation.
(6)
Highlydependentupontheworkload,datavolume,andfailoverprocedures.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 7
SQLServerAlwaysOnLayersofProtection
SQLServerAlwaysOnsolutionshelpprovidefaulttoleranceanddisasterrecoveryacrossseverallogical
andphysicallayersofinfrastructureandapplicationcomponents.Historically,ithasbeenacommon
practicetohaveaseparationofdutiesandresponsibilitiesforthevariousinvolvedaudiencesandroles,
suchthateachwaspredominatelyonlyconcernedaportionofthosesolutionlayers.
Thissectionofthepaperisorganizedtowalkthroughadeeperdescriptionofeachofthoselayers,and
toofferrationaleandguidanceforyourdesigndiscussionsandimplementationdecisions.
AsuccessfulSQLServerAlwaysOnsolutionrequiresunderstandingandcollaborationacrosstheselayers:
Infrastructurelevel.Serverlevelfaulttoleranceandintranodenetworkcommunicationleverages
WindowsServerFailoverClustering(WSFC)featuresforhealthmonitoringandfailovercoordination.
SQLServerinstancelevel.ASQLServerAlwaysOnFailoverClusterInstance(FCI)isaSQLServer
instancethatisinstalledacrossandcanfailovertoservernodesinaWSFCcluster.Thenodesthat
hosttheFCIareattachedtorobustsymmetricsharedstorage(SANorSMB).
Databaselevel.Anavailabilitygroupisasetofuserdatabasesthatfailovertogether.Anavailability
groupconsistsofaprimaryreplicaandonetofoursecondaryreplicas.Eachreplicaishostedbyan
instanceofSQLServer(FCIornonFCI)onadifferentnodeoftheWSFCcluster.
Clientconnectivity.DatabaseclientapplicationscanconnectdirectlytoaSQLServerinstance
networkname,ortheymayconnecttoavirtualnetworkname(VNN)thatisboundtoanavailability
grouplistener.TheVNNabstractstheWSFCclusterandavailabilitygrouptopology,
logicallyredirectingconnectionrequeststotheappropriateSQLServerinstanceanddatabasereplica.
ThelogicaltopologyofarepresentativeAlwaysOnsolutionisillustratedinthisdiagram:
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 8
InfrastructureAvailability
BothAlwaysOnAvailabilityGroupsandAlwaysOnFailoverClusterInstancesleveragetheWindows
ServeroperatingsystemandWSFCasaplatformtechnology.Morethaneverbefore,successful
MicrosoftSQLServerdatabaseadministratorswillrelyuponasolidunderstandingofthesetechnologies.
WindowsOperatingSystem
SQLServerreliesupontheWindowsplatformtoprovidefoundationalinfrastructureandservicesfor
networking,storage,security,patching,andmonitoring.
ThedifferenteditionsofSQLServer2012progressivelybuildupontheincreasingcapabilitiesand
capacityofsimilareditionsoftheWindowsServer2008R2operatingsystem,includingWindowsServer
2008R2Standardoperatingsystem,WindowsServer2008R2Enterpriseoperatingsystem,and
WindowsServer2008R2Datacenteroperatingsystem.
Formoreinformation,see:HardwareandSoftwareRequirementsforInstallingSQLServer
2012(http://msdn.microsoft.com/enus/library/ms143506(SQL.110).aspx).
WindowsServerCoreInstallationOption
Asakeyhighavailabilityfeature,SQLServer2012supportsdeploymentontheServerCoreinstallation
optioninWindowsServer2008orlater.TheServerCoreinstallationoptionprovidesaminimal
environmentforrunningspecificserverroleswithlimitedfunctionalityandverylimitedGUIapplication
support.Bydefault,onlynecessaryservicesandacommandpromptenvironmentareenabled.
Thismodeofoperationreducestheoperatingsystemattacksurfaceandsystemoverhead,anditcan
significantlyreduceongoingmaintenance,servicing,andpatchingrequirements.
AkeyconsiderationfordeployingSQLServer2012onWindowsServerCoreisthatalldeployment,
configuration,administration,andmaintenanceofSQLServerandoftheoperatingsystemmustbe
doneusingascriptingenvironmentsuchasWindowsPowerShell,orthroughtheuseofcommandlineor
remotetools.
OptimizingSQLServerforPrivateCloud
HighavailabilityanddisasterrecoveryscenariosareincreasinglycriticalinthePrivateCloud
environment.DeploySQLServertoyourPrivateCloudtohelpensurethatyourcomputer,networkand
storageresourcesareusedefficiently,reducingbothphysicalfootprintandcapitalandoperational
expenses.Ithelpsyouconsolidatedeployments,scaleyourresourcesefficiently,anddeployresources
ondemandwithoutcompromisingcontrol.
InadditiontoWindowsServerFailoverClusteringsupportforbothHyperVhostandguestsystems,SQL
ServeralsosupportsLiveMigration,whichistheabilitytomovevirtualmachinesbetweenhostswithno
discernibledowntime.LiveMigrationalsoworksinconjunctionwithguestclustering.
Formoreinformation,seePrivateCloudComputingOptimizingSQLServerforPrivate
Cloud(http://www.microsoft.com/SqlServerPrivateCloud).
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 9
WindowsServerFailoverClustering
WindowsServerFailoverClustering(WSFC)providesinfrastructurefeaturesthatsupportthehigh
availabilityanddisasterrecoveryscenariosofhostedserverapplicationssuchasMicrosoftSQLServer.
IfaWSFCclusternodeorservicefails,theservicesorresourcesthatwerehostedonthatnodecanbe
automaticallyormanuallytransferredtoanotheravailablenodeinaprocessknownasfailover.With
AlwaysOnsolutions,thisprocessappliestobothFCIsandtoavailabilitygroups.
ThenodesintheWSFCclusterworktogethertocollectivelyprovidethesetypesofcapabilities:
Distributedmetadataandnotifications.WSFCserviceandhostedapplicationmetadatais
maintainedoneachnodeinthecluster.ThismetadataincludesWSFCconfigurationandstatusin
additiontohostedapplicationsettings.Changestothemetadataorstatusononenodeare
automaticallypropagatedtotheothernodesinthecluster.
Resourcemanagement.Individualnodesintheclustermayprovidephysicalresourcessuchas
directattachedstorage(DAS),networkinterfaces,andaccesstoshareddiskstorage.Hosted
applications,suchasSQLServer,registerthemselvesasaclusterresource,andtheycanconfigure
startupandhealthdependenciesuponotherresources.
Healthmonitoring.Internodeandprimarynodehealthdetectionisaccomplishedthrougha
combinationofheartbeatstylenetworkcommunicationsandresourcemonitoring.Theoverall
healthoftheclusterisdeterminedbythevotesofaquorumofnodesinthecluster.
Failovercoordination.Eachresourceisconfiguredtobehostedonaprimarynode,andeachcanbe
automaticallyormanuallytransferredtooneormoresecondarynodes.Ahealthbasedfailover
policycontrolsautomatictransferofresourceownershipbetweennodes.Nodesandhosted
applicationsarenotifiedwhenfailoveroccurssothattheycanreactappropriately.
Formoreinformation,seeWindowsServer|FailoverClusteringandNode
Balancing(http://www.microsoft.com/windowsserver2008/en/us/failoverclusteringmain.aspx).
Note:ItisnowcriticallyimportantthatdatabaseadministratorsunderstandtheinnerworkingsofWSFC
clustersandquorummanagement.AlwaysOnhealthmonitoring,management,andfailurerecovery
stepsareallintrinsicallytiedtoyourWSFCconfiguration.
WSFCStorageConfigurations
WindowsServerFailoverClusteringreliesuponeachnodeintheclustertomanageitsconnected
storagedevices,diskvolumes,andfilesystem.WSFCassumesthatthestoragesubsystemisextremely
robust,andthereforeifthestoragedeviceattachedtoanodeisunavailable,theclusternodeis
consideredtobeatfault.
Forwritebasedoperations,adiskvolumeislogicallyattachedtoasingleclusternodeatatimeusinga
SCSI3persistentreservation.Dependinguponstoragesubsystemcapabilitiesandconfiguration,ifa
nodefails,logicalownershipofthediskvolumecanbetransferredtoanothernodeinthecluster.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 10
SQLServerAlwaysOnsolutionsbothleverageandarerestrictedtocertainWSFCstorageconfiguration
combinations,including:
Directattachedvs.remote.Storagedevicesaredirectlyphysicallyattachedtotheserver,orthey
arepresentedbyaremotedevicethroughanetworkorhostbusadaptor(HBA).Remotestorage
technologiesincludeStorageAreaNetwork(SAN)basedsolutionssuchasiSCSIorFibreChannel,as
wellasServerMessagingBlock(SMB)filesharebasedsolutions.
Symmetricvs.asymmetric.Storagedevicesareconsideredsymmetricifexactlythesamelogicaldisk
volumeconfigurationandfilepathsarepresentedtoeachnodeinthecluster.Thephysical
implementationandcapacityoftheunderlyingdiskvolumescanvary.
Dedicatedvs.shared.Dedicatedstorageisreservedforuseandassignedtoasinglenodeinthe
cluster.Sharedstorageisaccessibletomultiplenodesinthecluster.Controlandownershipof
compliantsharedstoragedevicescanbetransferredfromonenodetoanotherusingSCSI3
protocols.WSFCsupportstheconcurrentmultinodehostingofclustersharedvolumesforfile
sharingpurposes.However,SQLServerdoesnotsupportconcurrentmultinodeaccesstoashared
volume.
Note:SQLServerFCIsstillrequiresymmetricalsharedstoragetobeaccessiblebyallpossiblenode
ownersoftheinstance.However,withtheintroductionofAlwaysOnAvailabilityGroups,youcannow
deploydifferentnonFCIinstancesofSQLServerinaWSFCcluster,eachwithitsownunique,dedicated,
localorremotestorage.
WSFCResourceHealthDetectionandFailover
EachresourceinaWSFCclusternodecanreportitsstatusandhealth,periodicallyorondemand.A
varietyofcircumstancesmayindicateaclusterresourcefailure,including:powerfailure,diskormemory
errors,networkcommunicationerrors,misconfiguration,ornonresponsiveservices.
YoucanmakeWSFCclusterresourcessuchasnetworks,storage,orservicesdependentuponone
another.Thecumulativehealthofaresourceisdeterminedbysuccessiverollupofitshealthwiththe
healthofeachofitsresourcedependencies.
ForAlwaysOnAvailabilityGroups,theavailabilitygroupandtheavailabilitygrouplistenerareregistered
asWSFCclusterresources.ForAlwaysOnFailoverClusterInstances,theSQLServerserviceandtheSQL
ServerAgentserviceareregisteredasWSFCclusterresources,andbotharemadedependentuponthe
instancesvirtualnetworknameresource.
IfaWSFCclusterresourceexperiencesasetnumberoferrorsorfailuresoveraperiodoftime,the
configuredfailoverpolicycausestheclusterservicetodooneofthefollowing:
Restarttheresourceonthecurrentnode.
Settheresourceoffline.
Initiateanautomaticfailoveroftheresourceanditsdependenciestoanothernode.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 11
Note:WSFCclusterresourcehealthdetectionhasnodirectimpactontheindividualnodeshealthorthe
overallhealthofthecluster.
WSFCClusterValidationWizard
TheclustervalidationwizardisafeaturethatisintegratedintofailoverclusteringinWindowsServer
2008andWindowsServer2008R2.Itisakeytoolforadatabaseadministratortousetohelpensure
thataclean,healthy,stableWSFCenvironmentexists,beforedeployingaSQLServerAlwaysOnsolution.
Withtheclustervalidationwizard,youcanrunasetoffocusedtestsoneitheracollectionofservers
thatyouintendtouseasnodesinacluster,oronanexistingcluster.Thisprocessteststheunderlying
hardwareandsoftwaredirectly,andindividually,toobtainanaccurateassessmentofhowwellaWSFC
clusterwouldbesupportedonagivenconfiguration.
Thisvalidationprocessconsistsofaseriesoftestsanddatacollectiononeachnodeinthesecategories:
Inventory.InformationonBIOSversions,environmentlevels,hostbustadapters,RAM,operating
systemversions,devices,services,drivers,andsoon.
Network.InformationonNICbindingorder,networkcommunications,IPconfiguration,andfirewall
configuration.ValidatesinternodecommunicationsonallNICs.
Storage.Informationondisks,drivecapacity,accesslatency,filessystems,andsoon.ValidatesSCSI
commands,diskfailoverfunctionality,andsymmetricorasymmetricstorageconfiguration.
Systemconfiguration.ValidatesActiveDirectoryconfiguration,thatdriversaresigned,memory
dumpsettings,requiredoperatingsystemfeaturesandservices,compatibleprocessorarchitecture,
andservicepackandWindowsSoftwareUpdatelevels.
Theresultsofthesevalidationtestsgiveyouinformationneededtofinetuneaclusterconfiguration,
tracktheconfiguration,andidentifypotentialclusterconfigurationissuesbeforetheycausedowntime.
YoucansaveareportofthetestsresultsasaHTMLdocumentforlaterreference.
YoushouldrunthesetestsbeforeandafteryoumakeanychangestoWSFCconfiguration,beforeyou
installSQLServer,andasapartofanydisasterrecoveryprocess.Aclustervalidationreportisrequired
byMicrosoftCustomerSupportServices(CSS)asaconditionofMicrosoftsupportingagivenWSFC
clusterconfiguration.
Formoreinformation,seeFailoverClusterStepbyStepGuide:ValidatingHardwareforaFailoverCluster
(http://technet.microsoft.com/enus/library/cc732035(WS.10).aspx).
Note:Ifyourclusterconfigurationhasasymmetricstorage,asisthecasewithhardwarebasedgeo
clusteringstoragesolutions,orasmaybethecasewithAlwaysOnAvailabilityGroups,youmayneedto
applyanumberofhotfixestopreventtheclustervalidationwizardfromfailingthestoragevalidation
steps.
Formoreinformation,seePrerequisites,Restrictions,andRecommendationsforAlwaysOnAvailability
Groups(http://msdn.microsoft.com/enus/library/ff878487(SQL.110).aspx#SystemReqsForAOAG).
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 12
WSFCQuorumModesandVotingConfiguration
WSFCusesaquorumbasedapproachtomonitoringoverallclusterhealthandmaximizenodelevelfault
tolerance.AfundamentalunderstandingofWSFCquorummodesandnodevotingconfigurationisvery
importanttodesigning,operating,andtroubleshootingyourAlwaysOnhighavailabilityanddisaster
recoverysolution.
ClusterHealthDetectionbyQuorum
EachnodeinaWSFCclusterparticipatesinperiodicheartbeatcommunicationtosharethenode's
healthstatuswiththeothernodes.Unresponsivenodesareconsideredtobeinafailedstate.
AquorumnodesetisamajorityofthevotingnodesandwitnessesintheWSFCcluster.Theoverallhealth
andstatusofaWSFCclusterisdeterminedbyaperiodicquorumvote.Thepresenceofaquorummeans
thattheclusterishealthyenoughtoprovidenodelevelfaulttolerance.
Theabsenceofaquorumindicatesthattheclusterisnothealthy.OverallWSFCclusterhealthmustbe
maintainedinordertoensurethathealthysecondarynodesareavailableforprimarynodestofailover
to.Ifthequorumvotefails,theentireWSFCclusterissetofflineasaprecautionarymeasure.Thisalso
causesallSQLServerinstancesregisteredwiththeclustertobestopped.
Note:IfaWSFCclusterissetofflinebecauseofquorumfailure,manualinterventionisrequiredtobring
itbackonline.Formoreinformation,seetheWSFCDisasterRecoverythroughForcedQuorumsection
laterinthispaper.
QuorumModes
AquorummodeisconfiguredattheWSFCclusterleveltospecifythemethodologyusedforquorum
voting.TheFailoverClusterManagerutilityrecommendsaquorummodebasedonthenumberofnodes
inthecluster.
Oneofthefollowingquorummodesdetermineswhatconstitutesaquorumofvotes:
NodeMajority.Morethanonehalfofthevotingnodesintheclustermustvoteaffirmativelyforthe
clustertobehealthy.
NodeandFileShareMajority.SimilartoNodeMajorityquorummode,exceptthataremotefile
shareisalsoconfiguredasavotingwitness,andconnectivityfromanynodetothatshareisalso
countedasanaffirmativevote.Morethanhalfofthepossiblevotesmustbeaffirmativeforthe
clustertobehealthy.
Asabestpractice,thewitnessfileshareshouldnotresideonanynodeinthecluster,anditshould
bevisibletoallnodesinthecluster.
NodeandDiskMajority.SimilartoNodeMajorityquorummode,exceptthatashareddiskcluster
resourceisalsodesignatedasavotingwitness,andconnectivityfromanynodetothatshareddiskis
alsocountedasanaffirmativevote.Morethanhalfofthepossiblevotesmustbeaffirmativeforthe
clustertobehealthy.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 13
DiskOnly.Ashareddiskclusterresourceisdesignatedasawitness,andconnectivitybyanynodeto
thatshareddiskiscountedasanaffirmativevote.
Formoreinformation,seeFailoverClusterStepbyStepGuide:ConfiguringtheQuorumina
Cluster(http://technet.microsoft.com/enus/library/cc770620(WS.10).aspx).
Note:Unlesseachnodeintheclusterisconfiguredtousethesamesharedstoragequorumwitnessdisk,
youshouldgenerallyusetheNodeMajorityquorummodeifyouhaveanoddnumberofvotingnodes,
ortheNodeandFileShareMajorityquorummodeifyouhaveanevennumberofvotingnodes.
VotingandNonVotingNodes
Bydefault,eachnodeintheWSFCclusterisincludedasamemberoftheclusterquorum;eachnode,file
sharewitness,anddiskwitnesshasasinglevoteindeterminingtheoverallclusterhealth.Thequorum
discussiontothispointinthispaperhascarefullyqualifiedthesetofWSFCclusternodesthatvoteon
clusterhealthasvotingnodes.Insomecircumstances,youmaynotwanteverynodetohaveavote.
EachnodeinaWSFCclustercontinuouslyattemptstoestablishaquorum.Noindividualnodeinthe
clustercandefinitivelydeterminethattheclusterasawholeishealthyorunhealthy.Atanygiven
moment,fromtheperspectiveofeachnode,someoftheothernodesmayappeartobeoffline,or
appeartobeintheprocessoffailover,orappearunresponsiveduetoanetworkcommunication
failure.Akeyfunctionofthequorumvoteistodeterminewhethertheapparentstateofeachofnodein
theWSFCclusterisindeedthatactualstateofthosenodes.
ForallofthequorummodelsexceptDiskOnly,theeffectivenessofaquorumvotedependsonreliable
communicationsamongallofthevotingnodesinthecluster.Youshouldtrustthequorumvotewhenall
nodesareonthesamephysicalsubnet.
However,ifanodeonanothersubnetisseenasnonresponsiveinaquorumvote,butitisactually
onlineandotherwisehealthy,thatismostlikelyduetoanetworkcommunicationsfailurebetween
subnets.Dependingupontheclustertopology,quorummode,andfailoverpolicyconfiguration,that
networkcommunicationsfailuremayeffectivelycreatemorethanoneset(orsubset)ofvotingnodes.
Ifmorethanonesubsetofvotingnodesisabletoestablishaquorumonitsown,thatisknownasa
splitbrainscenario.Insuchascenario,thenodesintheseparatequorumsmaybehavedifferently,and
inconflictwithoneanother.
Note:Thesplitbrainscenarioispossibleonlyifasystemadministratormanuallyperformsaforced
quorumoperation,orinveryrarecircumstances,aforcedmanualfailover,explicitlysubdividingthe
quorumnodeset.Formoreinformation,seetheWSFCDisasterRecoverythroughForcedQuorum
sectionlaterinthispaper.
Tosimplifyyourquorumconfigurationandincreaseuptime,youmaywanttoadjusteachnodes
NodeWeightsetting(avalueof0or1)sothatthenodesvoteisnotcountedtowardsthequorum.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 14
RecommendedAdjustmentstoQuorumVoting
Todeterminetherecommendedquorumvotingconfigurationforthecluster,applytheseguidelines,in
sequentialorder:
1. Novotebydefault.Assumethateachnodeshouldnotvotewithoutexplicitjustification.
2. Includeallprimarynodes.EachnodethathostsanAlwaysOnAvailabilityGroupprimaryreplicaoris
thepreferredowneroftheAlwaysOnFailoverClusterInstanceshouldhaveavote.
3. Includepossibleautomaticfailoverowners.EachnodethatcouldhostaprimaryreplicaorFCI,as
theresultofanautomaticfailover,shouldhaveavote.
4. Excludesecondarysitenodes.Ingeneral,donotgivevotestonodesthatresideatasecondary
disasterrecoverysite.Youdonotwantnodesinthesecondarysitetocontributetoadecisionto
taketheclusterofflinewhenthereisnothingwrongwiththeprimarysite.
5. Oddnumberofvotes.Ifnecessary,addawitnessfileshare,awitnessnode(withorwithoutaSQL
Serverinstance),orawitnessdisktotheclusterandadjustthequorummodetopreventpossible
tiesinthequorumvote.
6. Reassessvoteassignmentspostfailover.Youdonotwanttofailoverintoaclusterconfiguration
thatdoesnotsupportahealthyquorum.
Formoreinformationonadjustingnodevotes,seeConfigureClusterQuorumNodeWeight
Settings(http://msdn.microsoft.com/enus/library/hh270281(SQL.110).aspx).
Youcannotadjustthevoteofafilesharewitness.Instead,youmustselectadifferentquorummodeto
includeorexcludeitsvote.
Note:SQLServerexposesseveralsystemdynamicmanagementviews(DMVs)thatcanhelpyou
administersettingsrelatedWSFCclusterconfigurationandnodequorumvoting.
Formoreinformation,seeMonitorAvailabilityGroups(http://msdn.microsoft.com/en
us/library/ff878305(SQL.110).aspx).
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 15
WSFCDisasterRecoverythroughForcedQuorum
Quorumfailureisusuallycausedbyasystemicdisasterorapersistentcommunicationsfailureinvolving
severalnodesintheWSFCcluster.Rememberthatquorumfailurecausesallclusteredservices,SQL
Serverinstances,andAvailabilityGroupsintheWSFCclustertobesetoffline,becausethecluster
cannotensurenodelevelfaulttolerance.AquorumfailuremeansthathealthyvotingnodesintheWSFC
clusternolongersatisfythequorummodel.Somenodesmayhavefailedcompletely,andsomemay
havejustshutdowntheWSFCserviceandareotherwisehealthy,exceptforthelossoftheabilityto
communicatewithaquorum.
TobringtheWSFCclusterbackonline,youmustcorrecttherootcauseofthequorumfailureonatleast
onenodeundertheexistingconfiguration.Inadisasterscenario,youmayneedtoreconfigureor
identifyalternativehardwaretouse.YoumayalsowanttoreconfiguretheremainingnodesintheWSFC
clustertoreflectthesurvivingclustertopologyaswell.
YoucanusetheforcedquorumprocedureonaWSFCclusternodetooverridethesafetycontrolsthat
tooktheclusteroffline.Thiseffectivelytellstheclustertosuspendthequorumvotingchecks,andlets
youbringtheWSFCclusterresourcesandSQLServerbackonlineonanyofthenodesinthecluster.
Thistypeofdisasterrecoveryprocessshouldincludethefollowingsteps:
1) Determinethescopeofthefailure.IdentifywhichavailabilitygroupsorSQLServerinstancesare
nonresponsiveandwhichclusternodesareonlineandavailableforpostdisasteruse,andthen
examinetheWindowseventlogsandtheSQLServersystemlogs.Wherepractical,youshould
preserveforensicdataandsystemlogsforlateranalysis.
2) StarttheWSFCclusterbyusingforcedquorumonasinglenode.Onanotherwisehealthynode,
manuallyforcetheclustertocomeonlineusingtheforcedquorumprocedure.Tominimizepotential
dataloss,selectanodethatwaslasthostinganavailabilitygroupprimaryreplica.
Formoreinformation,seeForceaWSFCClustertoStartWithouta
Quorum(http://msdn.microsoft.com/enus/library/hh270275(v=SQL.110).aspx).
Note:Ifyouusetheforcedquorumsetting,quorumchecksareblockedclusterwideuntiltheWSFC
clusterachievesamajorityofvotesandautomaticallytransitionstoaregularquorummodeof
operation.
3) StarttheWSFCservicenormallyoneachotherwisehealthynode,oneatatime.Youdonothaveto
specifytheforcedquorumoptionwhenyoustarttheclusterserviceontheothernodes.
AstheWSFCserviceoneachnodecomesbackonline,itnegotiateswiththeotherhealthynodesto
synchronizethenewclusterconfigurationstate.Remembertodothisonenodeatatimetoprevent
potentialraceconditionsinresolvingthelastknownstateofthecluster.
Note:Ensurethateachnodethatyoustartcancommunicatewiththeothernewlyonlinenodes,or
youruntheriskofcreatingmorethanonequorumnodeset;thatisasplitbrainscenario.Ifyour
findingsinstep1areaccurate,thisshouldnotoccur.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 16
4) Applynewquorummodeandnodevoteconfiguration.Ifyousuccessfullyrestartedallnodesinthe
clusterusingtheforcedquorumprocedure,andifyoucorrectedtherootcauseofthequorum
failure,youdonotneedtomakechangestotheoriginalquorummodeandnodevoteconfiguration.
Otherwise,youshouldevaluatethenewlyrecoveredclusternodeandavailabilityreplicatopology,
andchangethequorummodeandvoteassignmentsforeachnodeasappropriate.SettheWSFC
clusterserviceonunrecoverednodesoffline,orsettheirnodevotestozero.
Note:Atthispoint,thenodesandSQLServerinstancesintheclustermayappeartoberestored
backtoregularoperation.However,ahealthyquorummaystillnotexist.UsingFailoverCluster
Manager,ortheAlwaysOnDashboardwithinSQLServerManagementStudio,ortheappropriate
DMVs,verifythatahealthyquorumhasbeenrestored.
5) Recoveravailabilitygroupdatabasereplicasasneeded.Somedatabasesmayrecoverandcome
backonlineontheirownaspartoftheregularSQLServerstartupprocess.Therecoveryofother
databasesmayrequireadditionalmanualsteps.
Youcanminimizepotentialdatalossandrecoverytimefortheavailabilitygroupreplicasbybringing
thembackonlineinthissequence,ifpossible:primaryreplica,synchronoussecondaryreplicas,
asynchronoussecondaryreplicas.
6) Repairorreplacefailedcomponentsandrevalidatethecluster.Nowthatyouhaverecoveredfrom
theinitialdisasterandquorumfailure,youshouldrepairorreplacethefailednodesandadjust
relatedWSFCandAlwaysOnconfigurationsaccordingly.Thiscanincludedroppingavailabilitygroup
replicas,evictingnodesfromthecluster,orflatteningandreinstallingsoftwareonanode.
Note:Youmustrepairorremoveallfailedavailabilityreplicas.SQLServer2012doesnottruncate
thetransactionlogpastthelastknownpointofthefarthestbehindavailabilityreplica.Ifafailed
replicaisnotrepairedorremovedfromtheavailabilitygroup,thetransactionlogswillgrowandyou
willruntheriskofrunningoutoftransactionlogspaceontheotherreplicas.
7) Repeatstep4asneeded.Thegoalistoreestablishtheappropriateleveloffaulttoleranceandhigh
availabilityforhealthyoperations.
8) ConductRPO/RTOanalysis.YoushouldanalyzeSQLServersystemlogs,databasetimestamps,and
Windowseventlogstodeterminerootcauseofthefailure,andtodocumentactualRecoveryPoint
andRecoveryTimeexperiences.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 17
SQLServerInstanceLevelProtection
ThenextlayerofprotectioninanAlwaysOnsolutionisthedataplatformitself;thesearethecapabilities
andfeaturesofferedbyMicrosoftSQLServer2012anditsintegrationwithWindowsServer
infrastructurecomponents.
AvailabilityImprovementsSQLServerInstances
ThesearenewSQLServer2012instancelevelfeaturesthatenhanceavailabilityforbothAlwaysOn
FailoverClusterInstances,aswellasforstandaloneinstancesthathostAlwaysOnAvailabilityGroups.
Theseimprovementsrepresentenhancementsformanagingandtroubleshootingfailoverscenarios:
FlexibleFailoverPolicy.Theoutputofthenewsystemstoredprocedureusedforrobustfailure
detection,sp_server_diagnostics,usestheFailureConditionLevelpropertytoconveytheseverityof
afailureaffectingtheSQLServerinstance.AWSFCfailoverpolicygovernshowthisvalueimpactsthe
SQLServerinstance;rangingfromrelativetoleranceoferrors,tobeingsensitivetoanySQLServer
internalcomponenterror.
Youcanconfigurefailovertobetriggeredbyanyoneofarangeoferrorlevels,including:server
down,serverunresponsive,criticalerror,moderateerror,oranyqualifiederror.The
FailureConditionLevelpropertycanbeusedforFCIoravailabilitygroupfailoverpolicies.
PriortoSQLServer2012,therewasnogranularityoferrorconditionstogovernfailover;any
servicelevelfailurecausedfailover.
Formoreinformation,seeFailoverPolicyforFailoverClusterInstances
(http://msdn.microsoft.com/enus/library/ff878664(SQL.110).aspx).
Enhancedinstrumentationandlogging.ThereareanumberofAlwaysOnspecificsystem
configurationviews,DMVs,performancecounters,andanextendedeventhealthsessionthat
capturesanddumpsinformationneededtotroubleshoot,tune,andmonitoryourAlwaysOn
deployment.ManyoftheseareexposedvianewSQLServerPolicyManagementfacetsandpolicies.
Formoreinformation,seeAlwaysOnAvailabilityGroupsDynamicManagementViewsandFunctions
(http://msdn.microsoft.com/enus/library/ff877943(SQL.110).aspx),andsys.dm_os_cluster_nodes
(http://msdn.microsoft.com/enus/library/ms187341(SQL.110).aspx).
SMBfilesharesupport.YoucanplacedatabasefilesonaWindowsServer2008orlaterremotefile
shareforbothstandaloneandfailoverclusterinstances,negatingtheneedforaseparatedrive
letterperFCI.Thisisagoodoptionforstorageconsolidationorforhostingdatabasefilestorageona
physicalserverforavirtualmachineguestoperatingsystem.Withtherightconfiguration,I/O
performancecanverynearlyapproximatethatofdirectattachedstorage.
Formoreinformation,seeSQLDatabasesonFileSharesIt'stimetoreconsiderthe
scenario(http://blogs.msdn.com/b/sqlserverstorageengine/archive/2011/10/18/sqldatabaseson
filesharesitstimetoreconsiderthescenario.aspx).
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 18
Note:InaWSFCcluster,youcannotaddaSMBfileshareresourcedependencytotheSQLServer
resourcegroup;youmusttakeseparatemeasurestoensuretheavailabilityofthefileshare.Ifthe
filesharebecomesunavailable,SQLServerthrowsanI/Oexceptionandgoesoffline.
WSFCinteroperabilitywithDNS.Thevirtualnetworkname(VNN)foranFCIoravailabilitygroup
listenerisregisteredwithDNSonlyduringVNNcreationorduringconfigurationchanges.AllvirtualIP
addresses,regardlessofonlineorofflinestate,areregisteredwithDNSunderthesamevirtual
networkname.ClientcallstoresolvethevirtualnetworknameinDNSreturnalloftheregisteredIP
addressinavaryingroundrobinsequence.
AlwaysOnFailoverClusterInstances
TheprimarypurposeofanAlwaysOnSQLServerFailoverClusterInstance(FCI)istoenhanceavailability
ofaSQLServerinstancehostedonlocalserverandstoragehardwarewithinasingledatacenter.
AnFCIisasinglelogicalSQLServerinstancethatisinstalledacrossnodesinaWindowsServerFailover
Clustering(WSFC)cluster,butonlyactiveononenodeatatime.Clientapplicationsconnecttoavirtual
networknameandvirtualIPaddressthatareownedbytheactiveclusternode.
EachinstallednodehasanidenticalconfigurationandsetofSQLServerbinaries.TheWSFCcluster
servicealsoreplicatesrelevantchangesfromtheactiveinstancesentriesintheWindowsregistrytoeach
installednode.EachnodethattheFCIisinstalledonisdesignatedasapossibleowneroftheinstance
anditsresources,withinapreferredfailoversequence.
Databasefilesarestoredonsharedsymmetricalstoragevolumesareregisteredasaresourcewiththe
WSFCcluster,andareownedbythenodethatcurrentlyhoststheFCI.
Formoreinformation,seeAlwaysOnFailoverClusterInstances(http://msdn.microsoft.com/en
us/library/ms189134(SQL.110).aspx).
FCIFailoverProcess
Ifadependentclusterresourcefails,anAlwaysOnFailoverClusterInstanceinteractswiththeWSFC
clusterserviceusingthishighlevelprocesstodoafailover:
1) Arestartisindicated.AperiodiccheckoftheWSFCorSQLServerFailoverPolicyconfiguration
indicatesafailedstate.Bydefault,aservicerestartisattemptedbeforeafailovertoanothernodeis
initiated.Atimeoutintherestartattemptindicatesaresourcefailure.
2) Afailoverisindicated.AFailoverPolicycheckindicatestheneedforanodefailover.
3) TheSQLServerserviceisstopped.Ifcurrentlyrunning,anorderlyshutdownoftheSQLServer
serviceisattempted.
4) TheWSFCclusterresourceistransferred.OwnershipoftheSQLServerclusterresourcegroupand
itsdependentnetworkandsharedstorageresourcesaretransferredtothenextpreferrednode
owneroftheFCI.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 19
5) SQLServerisstartedonthenewnode.TheSQLServerinstancegoesthroughitsnormalstartup
procedures.Ifitdoesnotcomebackonlinewithinapendingtimeoutperiod,theclusterserviceputs
theresourceonthisnewnodeinafailedstate.
6) Userdatabasesarerecoveredonthenewnode.Eachuserdatabaseisplacedinrecoverymode
whiletransactionlogredooperationsareappliedanduncommittedtransactionsarerolledback.
FCIImprovements
PreviousversionsofSQLServerhaveofferedaFCIinstallationoption;however,severalfeature
enhancementsinSQLServer2012improveavailabilityrobustnessandserviceability:
Multisubnetclustering.SQLServer2012supportsWSFCclusternodesthatresideinmorethanone
subnet.AgivenSQLServerinstancethatresidesonaWSFCclusternodecanstartifanynetwork
interfaceisavailable;thisisknownasanORclusterresourcedependency.
PriorversionsofSQLServerrequiredthatallnetworkinterfacesbefunctionalfortheSQLServer
servicetostartorfailover,andthattheyallexistonthesamesubnetorVLAN.
Note:Storagelevelreplicationbetweenclusternodesisnotimplicitlyenabledwithmultisubnet
clustering.YourmultisubnetFCIsolutionmustleverageathirdpartySANbasedsolutiontoreplicate
dataandcoordinatestoragefailoverbetweenclusternodes.
Formoreinformation,seeSQLServer2012AlwaysOn:MultisiteFailoverCluster
Instance(http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/12/22/sqlserver2012
alwayson_3a00_multisitefailoverclusterinstance.aspx).
Robustfailuredetection.TheWSFCclusterservicemaintainsadedicatedadministrativeconnection
toeachSQLServer2012FCIonthenode.Onthisconnection,aperiodicalcalltoaspecialsystem
storedprocedure,sp_server_diagnostics,returnsaricharrayofsystemhealthdiagnostic
information.
PriortoSQLServer2012,theprimaryhealthdetectionmechanismforaFCIwasimplementedasa
simpleonewaypollingprocess.Inthisprocess,theWSFCclusterserviceperiodicallycreatedanew
SQLclientconnectiontotheinstance,queriedtheservername,andthendisconnected.Afailureto
connect,oraquerytimeout,forwhateverreason,triggeredafailoverwithverylittleavailable
diagnosticinformation.
Formoreinformation,seesql_server_diagnostics(http://msdn.microsoft.com/en
us/library/ff878233(SQL.110).aspx).
ThereisnowbroadersupportforFCIstoragescenarios:
Bettermountpointsupport.SQLServersetupnowrecognizesclusterdiskmountpointsettings.The
specifiedclusterdisksandalldisksmountedtoitareautomaticallyaddedtotheSQLServerresource
dependencyduringsetup.
tempdbonlocalstorage.FCIsnowsupportplacementoftempdbonlocalnonsharedstorage,such
asalocalsolidstatedrive,potentiallyoffloadingasignificantamountofI/OfromasharedSAN.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 20
PriortoSQLServer2012,FCIsrequiredtempdbtobelocatedonasymmetricalsharedstorage
volumethatfailedoverwithothersystemdatabases.
Note:Thelocationoftempdbisstoredinthemasterdatabase,whichmovesbetweennodesduring
failover.Itmustbeonavalidsymmetricalfilepath(drive,folders,andpermissions)onallpotential
nodeowners,orelsetheSQLServerservicewillnotstartonsomenodes.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 21
DatabaseAvailability
ThehighavailabilitycapabilitiesofferedbytheinfrastructureandSQLServerinstancelevelcomponents
worktogethertoimplicitlyprotecthosteddatabases.AnAlwaysOnsolutionoffersanadditionalsetof
optionsforexplicitlyprotectingdatabasedataanddatatierapplications.
AlwaysOnAvailabilityGroups
AnavailabilitygroupisasetofuserdatabasesthatfailovertogetherfromoneSQLServerinstanceto
anotherwithinthesameWSFCcluster.Clientapplicationscanconnecttotheavailabilitygroups
databasesthroughaWSFCvirtualnetworkname,knownasanavailabilitygrouplistener,whichabstracts
theunderlyingSQLServerinstances.
AlwaysOnAvailabilityGroupsrelyuponWindowsServerFailoverClusteringforhealthmonitoring,
failovercoordination,andserverconnectivity.YoumustenableAlwaysOnsupportonaSQLServer
instancethatresidesonaWSFCclusternode.However,thatinstancedoesnothavetobeaFCI,andit
doesnotrequiretheuseofsymmetricalsharedstorage.
Formoreinformation,seeOverviewofAlwaysOnAvailabilityGroups(http://msdn.microsoft.com/en
us/library/ff877884(SQL.110).aspx).
AvailabilityReplicasandRoles
EachSQLServerinstanceintheavailabilitygrouphostsanavailabilityreplicathatcontainsacopyofthe
userdatabasesintheavailabilitygroup.ASQLServerinstancecanhostonlyoneavailabilityreplicafrom
agivenavailabilitygroup,butmultipleavailabilitygroupsmayresideonthesameinstance.TheSQL
Serverinstancemusthavededicated(nonshared)storagevolumes.
Oneoftheavailabilityreplicasservesintheroleofprimaryreplica.Itisdesignatedasthemastercopyof
theavailabilitygroupdatabasesandisenabledforread/writeoperations.
Anavailabilitygroupcancontainfromonetofouradditionalreadonlyavailabilityreplicasthateach
separatelyserveintheroleofasecondaryreplica.
AvailabilityReplicaSynchronization
Thecontentsofeachdatabaseinanavailabilitygrouparesynchronizedfromtheprimaryreplicatoeach
ofsecondaryreplicasthroughamechanismofSQLServerlogbaseddatamovement.Forthisreason,all
databasesintheavailabilitygroupmustbesettothefullrecoverymodel.
Secondaryreplicasareinitializedwithafullbackupandrestoreoftheprimaryreplicasdatabasesand
transactionlogs.Asnewtransactionsarecommittedontheprimaryreplica,thecorrespondingportion
ofthetransactionlogiscached,queued,andthensentoverthenetworktoadatabasemirroring
endpointoneachofthesecondaryreplicanodes.
Inthismanner,newentriesintheprimaryreplicatransactionlogareappendedontoeachofthe
secondaryreplicastransactionlogs.Eachsecondaryreplicaperiodicallycommunicatesalogsequence
number(LSN)backtotheprimaryreplicatoindicateawatermarkofhowmuchoftheirtransactionlog
hasbeenhardenedandflushedtotheremotedisk.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 22
Note:Eachavailabilityreplicahasitsownsetofindependenttransactionlogredothreadsthatarenot
partoftheavailabilityreplicasynchronizationprocess.Youmayperceivedelaysinthelogredoprocess
onthesecondaryreplicasasdatalatency.
Inadditiontohavingaroleofprimaryorsecondary,eachavailabilityreplicaalsohasanavailability
mode,whichgovernsthecoordinationofhardeningthetransactionlogsduringaCOMMITTRAN
statement:
Synchronouscommitmode.Theprimaryreplicacommitsagiventransactiononlyafterall
synchronouscommitsecondaryreplicasacknowledgethattheyhavefinishedhardeningtheir
respectivetransactionlogspastthattransactionsLSN.Anavailabilitygroupcanhaveupto2
synchronouscommitsecondaryreplicas.
Synchronouscommitmodeintroducestransactionlatencyontheprimaryreplicadatabases,butit
ensuresthatthereisnodatalossonthesecondaryreplicasforcommittedtransactions.
Asynchronouscommitmode.Theprimaryreplicacommitstransactionsafterhardeningthelocal
transactionlog,butitdoesnotwaitforacknowledgementthatanasynchronouscommitsecondary
replicahashardeneditstransactionlog.Anavailabilitygroupcanhaveupto4asynchronouscommit
secondaryreplicas,butnomorethanatotalof4secondaryreplicasofanytype.
Asynchronouscommitmodeminimizestransactionlatencyontheprimaryreplicadatabasesbut
allowsthesecondaryreplicatransactionlogstolagbehind,makingsomedatalosspossible.
Formoreinformation,seeAvailabilityModes(http://msdn.microsoft.com/en
us/library/ff877931(SQL.110).aspx).
Theoverallhealthofthedataflowbetweentheavailabilityreplicasisindicatedbythesynchronization
stateofeachreplica.Youwillmostlikelyexperiencedatalossifyoufailovertoasecondaryreplicawith
asynchronizationstateofanythingotherthanSynchronizedorSynchronizing.
Eachsecondaryreplicassynchronizationstreamhasasessiontimeoutproperty.Whenasecondary
replicaconfiguredforasynchronouscommitavailabilitymodefailswithasessiontimeout,itis
temporarilymarkedinternallyasasynchronous.Thisisdonesothatthesecondaryreplicafailuredoes
notimpacthardeningofthetransactionlogontheprimaryreplica.Afterthatsecondaryreplicais
healthyandcaughtbackupwithprimaryreplica,itautomaticallyrevertstonormalsynchronouscommit
modeoperations.
AvailabilityGroupFailover
Theavailabilitygroupandacorrespondingvirtualnetworknameareregisteredasresourcesinthe
WSFCcluster.Anavailabilitygroupfailsoveratthelevelofanavailabilityreplica,baseduponthehealth
andfailoverpolicyoftheprimaryreplica.
AnavailabilitygroupfailoverpolicyusestheFailureConditionLevelpropertytoindicatetheseverity
tolerancelevelforafailureaffectingtheavailabilitygroup,inconjunctionwiththe
sp_server_diagnosticssystemstoredprocedure.ThissamemechanismisusedforFCIfailoverpolicies.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 23
Intheeventofafailover,insteadoftransferringownershipofsharedphysicalresourcestoanother
node,WSFCisleveragedtoreconfigureasecondaryreplicaonanotherSQLServerinstancetotakeover
theroleofprimaryreplica.Theavailabilitygroup'svirtualnetworknameresourceisthentransferredto
thatinstance.Allclientconnectionstotheinvolvedavailabilityreplicasarereset.
Baseduponthecurrenthealth,synchronizationstate,andavailabilitymodeofthereplicas,eachreplica
hasacompositefailoverreadinessstatethatindicatesthepotentialfordataloss.Thisreplicahealth
informationisviewableintheAlwaysOnDashboard,orinthesys.dm_hadr_availability_replica_states
systemview.
Eachavailabilityreplicaalsohasaconfiguredfailovermode,whichgovernsreplicabehaviorwhen
failoverisindicated.
Automaticfailover(withoutdataloss).ThisallowsforthefastestfailovertimeofanyAlwaysOn
configurationbecausethesecondaryreplicatransactionlogisalreadyhardenedand
synchronized.Opentransactionsontheprimaryreplicaarerolledback,andtheprimaryreplicarole
istransferredtoasecondaryreplicawithoutanyuserintervention.
Theprimaryandsecondaryreplicasmustbesettoautomaticfailovermode,andbothmustbeset
tosynchronouscommitavailabilitymode.Thesynchronizationstatebetweenthereplicasmustbe
Synchronized.Additionally,theWSFCclustermusthaveahealthyquorum.
AutomaticfailoverisnotsupportediftheprimaryorsecondaryreplicaresidesonanFCI.Thisis
blockedtopreventapotentialraceconditionbetweenavailabilitygroupandFCIfailovers.
Manualfailover.Thisallowstheadministratortoassessthestateoftheprimaryreplica,andmakea
decisiontodeliberatelyfailovertoasecondaryreplicaornot.
Dependingupontheavailabilitymodeandsynchronizationstate,youhavethesechoices:
o Plannedmanualfailover(withoutdataloss).Youcanperformthistypeoffailoveronlyifboth
theprimaryandsecondaryreplicasarehealthyandinaSynchronizedstate.Thisisfunctionally
equivalenttoanautomaticfailover.
o Forcedmanualfailover(allowingpotentialdataloss).Thisistheonlyformoffailoverthatis
possibleifthetargetsecondaryreplicaisinasynchronouscommitavailabilitymode,orifitis
notsynchronizedwiththeprimaryreplica.
Warning:Youshouldusethisfailoveroptioninadisasterrecoverysituationonly.Iftheprimary
replicaishealthyandavailable,youshouldchangetheavailabilitymodeoftheinvolvedreplicas
tosynchronouscommitandthenperformaplannedmanualfailover.
Formoreinformation,seePerformaForcedManualFailoverofanAvailability
Group(http://msdn.microsoft.com/enus/library/ff877957(SQL.110).aspx).
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 24
Youmustperformamanualfailoverifanyofthefollowingconditionsaretrueabouteithertheprimary
replicaorthesecondaryreplicathatyouwanttofailoverto:
Failovermodeissettomanual.
Availabilitymodeissettoasynchronouscommit.
ReplicaresidesonanFCI.
Formoreinformation,seeFailoverModes(AlwaysOnAvailability
Groups)(http://msdn.microsoft.com/enus/library/hh213151(SQL.110).aspx).
Note:Afterafailover,ifthenewprimaryreplicaisnotsettothesynchronouscommitmode,the
secondaryreplicaswillindicateaSuspendedsynchronizationstate.Nodatawillflowtothesecondary
replicasuntiltheprimaryreplicaissettosynchronouscommitmode.
AvailabilityGroupListener
AnavailabilitygrouplistenerisaWSFCvirtualnetworkname(VNN)thatclientscanusetoaccessa
databaseintheavailabilitygroup.TheVNNclusterresourceisownedbytheSQLServerinstanceon
whichtheprimaryreplicaresides.
ThevirtualnetworknameisregisteredwithDNSonlyduringavailabilitygrouplistenercreationorduring
configurationchanges.AllvirtualIPaddressesthataredefinedintheavailabilitygrouplistenerare
registeredwithDNSunderthesamevirtualnetworkname.
Tousetheavailabilitygrouplistener,aclientconnectionrequestmustspecifythevirtualnetworkname
astheserver,andadatabasenamethatisintheavailabilitygroup.Bydefault,thisshouldresultina
connectiontotheSQLServerinstancethatishostingtheprimaryreplica.
Atruntime,theclientusesitslocalDNSresolvertogetalistofIPaddressesandTCPportsthatmapto
thevirtualnetworkname.TheclientthenattemptstoconnecttoeachoftheIPaddresses,untilitis
successful,oruntilitreachestheconnectiontimeout.Theclientwillattempttomaketheseconnections
inparalleliftheMultiSubnetFailoverparameterissettotrue,enablingmuchfasterclientfailovers.
Intheeventofafailover,clientconnectionsareresetontheserver,ownershipoftheavailabilitygroup
listenermoveswiththeprimaryreplicaroletoanewSQLServerinstance,andtheVNNendpointis
boundtothenewinstancesvirtualIPaddressesandTCPports.
Formoreinformation,seeClientConnectivityandApplicationFailover(http://msdn.microsoft.com/en
us/library/hh213417(SQL.110).aspx).
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 25
ApplicationIntentFiltering
Whileconnectingthroughtheavailabilitygrouplistener,theapplicationcanspecifywhetheritsintentis
tobothreadandwritedataorwhetheritwillexclusivelyperformreadonlyoperations.Ifnotspecified,
thedefaultapplicationintentfortheclientisreadwrite.
Fortheprimaryroleandsecondaryroleofeachavailabilityreplica,youcanalsospecifyaconnection
accesspropertythatwillbeusedasaconnectionlevelfilterontheclientsapplicationintent.Bydefault,
invalidapplicationintentandconnectionaccesscombinationsresultinarefusedconnection.SQLServer
shouldfilteroutclientconnectionrequestsusingthefollowingrules.
Whiletheavailabilityreplicaisintheprimaryrole,andconnectionaccessisequalto:
Allowanyapplicationintent.Donotfilteranyclientconnectionsforapplicationintent.
Allowonlyexplicitread/writeintent.Ifclientspecifiesreadonly,rejectconnection.
Whiletheavailabilityreplicaisinthesecondaryrole,andconnectionaccessisequalto:
Noconnectionsallowed.Refuseallconnections;replicaisusedonlyfordisasterrecovery.
Allowanyapplicationintent.Donotfilteranyclientconnectionsforapplicationintent.
Readonlyapplicationintent.Ifclientdoesnotspecifyreadonly,rejectconnection.
Formoreinformation,seeConfigureConnectionAccessonanAvailability
Replica(http://msdn.microsoft.com/enus/library/hh213002(SQL.110).aspx).
ApplicationIntentReadOnlyRouting
AkeyvaluepropositionforAlwaysOnAvailabilityGroupsistheabilitytoleverageyourstandby
hardwareinfrastructureforpurposesotherthandisasterrecovery.Byconfiguringoneormoreofyour
secondaryreplicasforreadonlyaccess,youcanoffloadsignificantworkloadsfromyourprimary
replicas.
Workloadsthatcanbereadilyadaptedtorunoffofareadonlysecondaryreplicainclude:reporting,
databasebackups,databaseconsistencychecks,indexfragmentationanalysis,datapipelineextraction,
operationalsupport,andadhocqueries.
Foreachavailabilityreplica,youcanoptionallyconfigureasequentialreadonlyroutinglistofSQLServer
instanceendpointstobeappliedwhilethatreplicaisintheprimaryrole.Ifpresent,thislistisusedto
redirectclientconnectionrequeststhatspecifyreadonlyapplicationintenttothefirstavailable
secondaryreplicainthelistthatsatisfiestheapplicationintentfiltersnotedearlier.
Note:Thereadonlyroutingredirectionisperformedbytheavailabilitygrouplistener,whichisbound
totheprimaryreplica.Iftheprimaryreplicaisoffline,clientredirectionwillnotfunction.
Formoreinformation,seeConfigureReadOnlyRoutingonanAvailabilityGroup(SQL
Server)(http://msdn.microsoft.com/enus/library/hh653924(SQL.110).aspx)
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 26
AvailabilityImprovementsDatabases
SQLServer2012hasanumberoffeatureenhancementsthatarespecifictodatabaseconfigurationand
capabilities.
Thefollowingimprovementreducesrecoverytime:
PredictableRecoveryTime.Youcansetatargetrecoverytimeintervalperdatabase,whichisused
tocontroltheschedulingofabackgroundCHECKPOINTcommand.Thisindirectcheckpointoccurs
periodically,baseduponestimatedtimeneededtorecoverthetransactionlogintheeventofa
restartorfailover.ThishastheeffectofsmoothingI/Oouttoroughlyequalproportionsforeach
checkpoint,andincreasingrecoverytime(RTO)predictability.
PriortoSQLServer2012,backgroundCHECKPOINTcommandswereissuedonafixedinterval,
irrespectiveoftransactionvolumeorload,whichcouldleadtounpredictablerecoverytimes.
Formoreinformation,seeDatabaseCheckpoints(http://msdn.microsoft.com/en
us/library/ms189573(SQL.110).aspx).
Theseimprovementsmitigatecommonscenariosthatcandriveplanneddowntime:
OnlineindexoperationsforLOBcolumns.Indexesthatcontaincolumnswithvarbinary(max),
varchar(max),nvarchar(max),orXMLdatatypescannowberebuiltorreorganizedonline.
OnlineschemamodificationfornewNOTNULLcolumns.IfanewNOTNULLcolumnisaddedwitha
defaultvaluetoaSQLServer2012databasetable,onlyaschemalockisrequiredtoupdatesystem
metadata;allrowsdonothavetobepopulatedduringtheALTERTABLEstatement.
SQLServerwillphysicallypersistthedefaultcolumnvalueonlyifarowisactuallymodifiedorre
indexed.Queriesreturnthedefaultvaluefrommetadata,unlessanactualcolumnvalueexists.
Thereisanexampleofbroadersupportforstoragescenarios:
AutomaticPageRepair.Certaintypesofstoragesubsystemerrorscancorruptadatapage,makingit
unreadable.AlwaysOnAvailabilityGroupscandetectandautomaticallyrecoverfromthesetypesof
errorsbyasynchronouslyrequestingandapplyingafreshcopyoftheaffecteddatapagesfroma
differentavailabilityreplica.
SimilarfunctionalityexistedpriortoSQLServer2012fordatabasemirroring,butitisnowenhanced
tosupportmultiplereplicas.
Formoreinformation,seeAutomaticPageRepair(http://msdn.microsoft.com/en
us/library/bb677167(SQL.110).aspx).
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 27
ClientConnectivityRecommendations
FollowtheseguidelinestoenableclientapplicationstotakefulladvantageofMicrosoftSQLServer2012
AlwaysOntechnologies:
AlwaysOnawareclientlibrary.Useaclientlibrarythatsupportsthetabulardatastream(TDS)
protocolversion7.4ornewer.ThisshouldprovidethedesiredclientsidefunctionalityforAlwaysOn
features.ExampleclientlibrariesincludetheDataProviderforSQLServerin.NETFramework4.02,
andtheSQLNativeClient11.0.
Connectionproviderproperty:MultiSubnetFailover=True.Usethiskeywordinyourconnection
stringstoenableclientlibrariestoattempttoconnectinparalleltoallIPaddressesthatare
registeredfortheavailabilitygrouplistenerortheFCIthathasIPaddressinmultiplesubnets.
Connectionproviderproperty:ApplicationIntent=ReadOnly.Wherepractical,offloadreadonly
workloadsfromyourprimaryreplicaontothesecondaryreplicas.
Legacyclientconnectiontimeout.Legacyclientdatabaselibrariesdonotimplementparallel
connectionattempts,sowhenmultipleIPaddressesarepresent,theytrytoconnecttoeachof
themsequentially,untiltheyencounteraTCPtimeout,oruntiltheymakeasuccessfulconnection.
Youshouldadjustyourconnectiontimeoutonlegacyclientstoaccommodatethepotential
sequentialtimeoutsandretrieswhenmultipleIPaddressesarepresent,toavaluethatisatleast15
seconds+21secondsforeverysecondaryreplica.
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 28
Conclusion
Thiswhitepaperhasestablishedthebaselinecontextforhowtoreduceplannedandunplanned
downtime,maximizeapplicationavailability,andprovidedataprotectionusingSQLServer2012
AlwaysOnhighavailabilityanddisasterrecoverysolutions.
Manyofthebusinessdriversandchallengesofplanning,managing,andmeasuringahighlyavailable
databaseenvironmentcanbequantifiedandexpressedasRecoveryPointObjects(RPO)andRecovery
TimeObjectives(RTO).
SQLServer2012AlwaysOnprovidescapabilitiesattheinfrastructure,dataplatform,anddatabaselevel
thatcanhelpyourorganizationaddresscommonhighavailabilityanddisasterrecoveryscenarios,ina
mannerthatcanbewelljustifiedusingRPOandRTOgoals.
For more information:
http://www.microsoft.com/sqlserver/: SQL Server Web site
http://technet.microsoft.com/en-us/sqlserver/: SQL Server TechCenter
http://msdn.microsoft.com/en-us/sqlserver/: SQL Server DevCenter
Did this paper help you? Please give us your feedback. Tell us on a scale of 1 (poor) to 5
(excellent), how would you rate this paper and why have you given it this rating? For example:
Are you rating it high due to having good examples, excellent screen shots, clear writing,
or another reason?
Are you rating it low due to poor examples, fuzzy screen shots, or unclear writing?
This feedback will help us improve the quality of white papers we release.
Send feedback.
Version 1.1, 21 February 2012.