Accelerating Data Protection
Accelerating Data Protection
Executive Summary
This white paper looks at the challenges facing organizations as they struggle to ensure that critical data and systems are protected in the event of widespread disruption. Network replication and periodic backups between remote sites are two of the most common ways to provide Business Continuity and Disaster Recovery (BCDR). If one site goes down, the organization can continue doing business using the replicated systems and data at the other site. The scalability of this model is being tested, however, by the rate at which data is growingburgeoning digital assets threaten to outpace the ability of replication and backup implementations to keep up. At the same time, the pressure for aggressive Recovery Point Objective (RPOs) and Recovery Time Objectives (RTOs) is not letting up. Among the solutions available, WAN optimization seems to hold a lot of promiseat least on paper. Unfortunately, because traditional WAN optimization was designed for end-user applications and branch networks, it has not been able to scale up to the levels necessary to ensure current and future data protection. To address the need for high-performance WAN acceleration, Infineta Systems has developed a set of new technologies that are specifically designed to improve the performance of interdata center applications, including replication and backup. The Infineta solutions work at speeds of up to 10 Gbps and combine data reduction, TCP optimization, and Layer 4 QoS to allow organizations to move more data across the same WAN infrastructure in less time.
In November 17, 2010, Chuck Hollis, CTO EMC Corporation, noted in his blog that EMC has 1000+ customers in its Petabyte Club. On January 14, 2011, he forecast an Exabyte Club, likely in 2012, for EMC customers who have reached 1000 petabytes of storage. http://chucksblog.emc.com/chucks_blog/2010/11/big-data-and-the-ever-expanding-emc-petabyte-club.html
WWW.INFINETA.COM
The answer of each of these is represented by metrics known as the Recovery Point Objective (RPO) and Recovery Time Objective (RTO). The IT organization is usually responsible for meeting the RPO and RTO, while at the same time managing costs, which is then expressed through service-level agreements (SLAs) with the LOBs. In general, the lower the RPO/RTO RPO and RTO requirements, the greater the cost of meeting them. A Recovery Point Objective and Recovery Time Objective Setting the RPO and RTO to zero for all assets and all represent the balance between the maximum acceptable data loss and the cost of achieving that objective. LOBs would be idealno data lost, no service interruptions, and no complaintshowever the expense RPO RTO and logistics of doing so can be unwieldy. To balance the cost of protection against the value of Last backup Event Systems available assets, the assets may be classified into criticality tiers. For example, a national retailer that records millions of Recovery Point Objective (RPO)The data restoration point, or maximum allowable time between the last backup and a transactions each month may determine that revenuedisruptive event. Data generated after the last restoration point is generating systems must always be available, that is, no subject to loss in the event of a disruption. transactions can be lost even in the event of a major Recovery Time Objective (RTO)Restoration time, or maximum allowable time that a system can be non-functional. disruption. Here, sales data and systems would be Service Level Agreement (SLA)Commitment negotiated classified as Tier 1, with a service level agreement with a stakeholder to define RPO/RTO for their particular line of business. These can be challenging because, to be effective, they promising an RPO and RTO of zero. The implication is that require an accurate assessment of which data is truly critical and all the transactional data and systems will simultaneously the true cost of downtime for this and other types of data. exist in two or more places. If any system, storage device, Criticality TiersIdentification and classification of data according to its value to the organization. or even a whole site is compromised, the business will fail-over seamlessly to the other system, data, or location. High costs typically mean that only the most critical data is classified as Tier 1. Less critical assets are classified as Tier 2 or Tier 3, but this can be difficult negotiation.
TIME
Additional statutes and stipulations in the USA regarding data protection and retention include, FINRA (Financial Industry Regulatory Authority, securities regulations), 21 CFR Part 11 (the US Food and Drug Administration, record retention), the GrammLeach-Bliley Act (requirements for securing nonpublic consumer information), FRCP (Federal Rules of Civil Procedure, e-
WWW.INFINETA.COM
impetus has been post-September 11 economic concerns regarding the interdependence of major banking institutions.3 Among storage analysts and professionals, it is a commonly held belief that for most large organizations, the rate of data storage is doubling every three to four years, and the amount of WAN traffic being transmitted between data centers is doubling every two years.4 Because inter-data center traffic is comprised primarily of BCDR traffic data (i.e., Tier 1 and Tier 2 assets), Tier 1 data Exabytes expanding storage demands are Tier 2 data Tier 3 data challenging the ability of organizations data growth to meet existing RPO/RTO commitments. The crux of the matter is Petabytes protection that as the volume of critical data storage grows, with RPO/RTO goals staying the same, the pipe between data centers Terabytes has to get bigger, fuller, or faster to time handle the increases. The other alternative, of course, is to leave the Figure 1. As storage demands grow, so to does the amount of data requiring protection, which makes it pipe as is and protect less data. The increasingly difficult to meet the same RTO/RPO goals. most common solution thus far has been to enlarge the pipe through a series of bandwidth upgrades. As Figure 1 shows, however, this method cannot be expected continue as storage (and the volume of Tier 1 data therein) grows from petabytes to exabytes in the coming years.
35 % 30 % 35 %
45
2008
25
2012
30 %
Next... The following sections looks at the most widely used methods of copying critical assets from
one location to another, and show why bandwidth is inherently limited as a means of keeping pace with growing storage demands. The paper then concludes with a look at how organizations can optimize their existing WAN links so more data can be transferred over the existing infrastructure, i.e., how the pipe can be made fuller, and faster.
Forrester Research, Inc. May 2010. "The Future of Data Center Wide-Area Networking."
WWW.INFINETA.COM
2016
Replication
Replication can be synchronous, which means each byte of data is written to multiple locations before the application can write new data, or it can be asynchronous, which means that remote store will always be some amount of time (and thus data) behind the source. Because of its sensitivity to time, synchronous replication is usually restricted to sites within 100 KM of each other,5 whereas asynchronous replication can occur at any distance and so is more common. As the distance between asynchronous sites grows, the data discrepancy between them also grows as a result of the increased latency. The difference can be anywhere from a few writes to many gigabytes of data, and at some point the difference between protecting data through replication or backups becomes indistinguishable.
Technique Replication Synchronous Asynchronous RPO = 0 RTO = 0 RPO = minutes to hours RTO = minutes to days Backup Continuous RPO = minutes to hours RTO = hours to days Snapshot RPO = hours to days RTO = hours to days
Table 1. Replication and Backups are used for data protection.
SLA
Characteristics
Bandwidth intensive Distance/latency limits Bandwidth intensive Calculated risk of data loss
Storage intensive More data in jeopardy Higher RTO Storage intensive Designed around restore points Higher RTO
Replication flows between data centers are not like branch-to-data-center traffic, which is usually comprised of many small, short-lived, low RTT connections. Replication flows tend to have unique characteristics and require specialized resources to overcome the limitations imposed on it by the WAN. High speedConnection speeds can be as high as 1 Gbps per connection. High volumeReplication traffic is constant and can total terabytes each day. Few connectionsReplication traffic uses relatively few connections compared to typical end-user scenarios (tens or hundreds of connections vs thousands or tens of thousands). Long-lived connectionsReplication connections are persistent, lasting days or months, while application delivery connections are often created on a per-transaction basis. Latency sensitiveReplication traffic is highly sensitive to latency. BurstyData transmissions start and stop suddenly and frequently.
Synchronous replication requires that new data be written and confirmed in two locations before the next new data will be accepted. The total latency budgetfrom endpoint to endpointfor synchronous replication usually cannot exceed five milliseconds, i.e., a distance of roughly 100 KM.
WWW.INFINETA.COM
Multi-hop replication
Some organizations have found another path forward by using a mix of synchronous and asynchronous replication in conjunction with multiple levels of backup, the so-called multi-hop strategy. Figure 3 illustrates how multi-hop replication can be used to work-around to the limits imposed by latency. Synchronous replication is used to ensure that Tier 1 data exists in multiple locations at the same time. Asynchronous replication is then used to copy the data from the intermediary site to a more distant location. Backup applications can also be employed to capture and move snapshots of the remaining data to remote locations to ensure that Business Continuance Volumes (BCVs) are available for all data tiers. Although multihop seems like a workable, short-term solution to the problem of keeping up with data growth, it is equipment-intensive and scalability is not clear. Solution-specific idiosyncrasies can also make managing the arrangement a challenge for IT staff.
Backups
Backup strategies are used to create a restore point for critical applications and data sets. In the event of disruption, any data generated after that point is subject to loss. For decades, backups were made to on-site tape drives, and the tapes were then hand-carried to an offsite location for storage. Tape backup tends to be slow, delivery to the off-site location is often best-effort basis, and the tape media itself may not support the lifespan of the data being stored on it. These issues, along with aggressive RPOs/RTOs, more Tier 1 systems and data, and improved Internet connections, have driven most organizations to turn to network-based backups, which include data snapshots, Continuous Data Protection (CDP), and periodic backups.
Vendor EMC HDS IBM HP Synchronous Replication SRDF/S RecoverPoint True Copy Synchronous Metro Mirror (PPRC, or Peer-to-Peer Remote Copy) HP Continuous Access Asynchronous Replication SRDF/A RecoverPoint True Copy Extended Distance Global Copy HP Continuous Access Backup SRDF/DM Hitachi Dynamic Replicator Global Mirror HP Continuous Access HP StorageWorks Enterprise Backup Solutions (ESB) SnapProtect CommVault Simpana SnapProtect NetBackup
SnapMirror
Table 2. The Infineta DMS optimizes the WAN to accelerate high-speed data protection solutions.
WWW.INFINETA.COM
Each of the network-based options has its own advantages and disadvantages, but the one thing they all have in common is that the scope of their protection is ultimately constrained by the WAN. As the amount of data being stored grows, so too do backup requirements, until at some point, the limit for how much data can be backed up within the available time frame (called the backup window) is reached.6 Either the last backup will not be able to finish by the time the next one is set to begin, or the amount of data for backup will have to be reduced to fit the available backup window.
utilization on long-distance links. Although bandwidth upgrades may improve throughput, the efficacy is subject to diminishing returns.
To keep up with daily changes of 5% to 2 petabytes of storage would require more than a day: Two petabytes = 2,000,000,000,000 bytes. Five percent of 2 trillion = 100GB, or 800 gigabits of data. Transferring 800Gb at 7Gbps, the throughput between NYC and SF, would take about 32 hours.
WWW.INFINETA.COM
San Fran
Tokyo
WWW.INFINETA.COM
The DMS is specifically designed to meet the unique demands of high-performance, low latency traffic such as occurs with replication and backup workflows. It creates a fuller, faster pipe for data transfers and provides an alternative to the cycle of upgrading WAN bandwidth to keep pace with growing storage. In short, the DMS provides: High per-connection throughputAccelerates a single TCP connection all the way up to 1 Gbps, which is critical for the traffic bursts of modern replication workflows. Optimization for high-latency WAN linksFills the existing WAN link regardless of distance by employing highly aggressive start and recovery algorithms Very low port-to-port latencyAverages 50 microseconds latency, which means acceleration for both synchronous and asynchronous traffic between data centers Complete protection against packet lossProtects TCP connection speeds by handling packet loss more efficiently than could the endpoint Packet orderingEnsures correct packet order delivery to the endpoint to eliminates the need for retransmits
WWW.INFINETA.COM
Contact Information
2870 Zanker Road, Suite 200 San Jose, CA 95134 Phone: (408) 514-6600 Sales / Customer Support: (866) 635-4049 Fax: (408) 514-6650 General inquiries: [email protected] Sales inquiries: [email protected]
2012 Infineta Systems, Inc. All rights reserved. No portion of the Documentation may be reproduced, stored in a retrieval system, or transmitted in any form or by any means without the express written permission of Infineta. Infineta disclaims all responsibility for any typographical, technical, or other inaccuracies, errors, or omissions in the Documentation. Infineta reserves the right, but has no obligation, to modify, update, or otherwise change the Documentation from time to time. Infineta, Infineta Systems, Data Mobility Switch, and Velocity Dedupe Engine are trademarks or registered trademarks of Infineta Systems, Inc., in the U.S. All other product and company names herein may be trademarks of their respective owners.
WWW.INFINETA.COM
10