0% found this document useful (0 votes)
24 views

Storage

Uploaded by

Mohamed thaha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Storage

Uploaded by

Mohamed thaha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Storage

Types of Storage
There are many types of storage media:
• Flash, this has become a cheap form of fast storage, especially in consumer products.
• Optical storage, which comes in the form of CDs, DVDs and BlueRay etc. These are
slow for data access, but still very useful for archives and movies.
• Magnetic Tape backup systems, which are still in use in corporate IT centres, but are
slow, and aren’t good for random access. Random access refers to the ability to
(effectively) access any piece of data by its address (e.g. block number on a hard disk –
see below) instantly.
• Magnetic or hard disks, which are discussed at length below, are a ubiquitous form of
high-volume, high-speed, random-access storage.
• A solid-state drive (SSD) is a data storage device that uses solid-state memory to
store persistent data. Unlike flash-based memory cards, an SSD emulates a hard disk
drive, thus easily replacing it in most applications. An SSD using SRAM or DRAM
(instead of flash memory) is often called a RAM drive. The advantage over (magnetic)
disk drives is speed, but the cost per gigabyte is 4 to 5 times that of disk drives, and at
the moment the amount of storage is much less per unit.

These types of storage can either be static or removable thanks to the ubiquity of USB
and firewire. In this paper we’ll be exclusively talking about magnetic disk (as in hard-
drive) storage.

Hard Disk

Hard disks are composed of:


 Multiple spinning magnetic platters that contain the magnetically encoded data.
The platters spin around an axle called a spindle, and sometimes a single drive
(like the one shown) can be referred to as such.
 Read/write heads that float above the surface of the platters, usually 2 (top and
bottom) for each platter. The heads all move in unison.
 A controller board that drives the heads and can convert I/O requests
(commands) into head movement and read/write operations.
 An interface that will be joined to a host adapter board, which for an internal (to a
computer) drive, is a separate unit.
 The drive will need power which, for an internal drive, is usually supplied by the
computers own power supply.

The disk drive is covered and hermetically sealed, since heads are designed to float
above the disk platter with less than a micron of space.

Each surface of the disk platter is divided into areas as shown. In the context of data,
the word „block‟ can have many different meanings, but in this context a block is the
smallest unit of data that is read and written. A block is also called (more formally) a
„track sector‟ or simply „sector‟ as shown by (C). A series of sectors makes up a track
(A), and there are several tracks on a surface. At the lowest level when a block (or
segment) is being read or written, there are several things that identify where that block
goes: The surface (identified by the head number), the track that the head should move
to, and the sector that should be read or written. This lowest level of data storage is
called „block-level‟ storage and implies that the data is composed of a series of bits,
with the drive having no notion of format, or what the data belongs to. In the operating
system (OS), there will be device drivers, file systems and applications that impose the
meaning on, and keep tabs on the individual blocks of data. This high level of data
storage is called „file-level‟ storage. Seek time is one of the three delays associated
with reading or writing data on a disk drive. The others are the rotational delay of the
disk, and transfer time. Their sum is the access time. In order to read or write data in a
particular sector, the head of the disk needs to be physically moved to the correct place.
This process is known as seeking, and the time it takes for the head to move to the right
place is the seek time. Seek time for a given disk varies depending on how far the
head's destination is from its origin at the time of each read or write; usually one
discusses a disk's average seek time.

Host Controller Card

For disk drives that are to be installed internally to a computer (such as a server), the
interface from the disk will be cabled to a Host Controller Card (or simply controller).
Depending on the type of controller, this can usually accommodate multiple disk drives.
You may also be able to plug in external devices to an external-facing interface. The
controller will usually plug straight into a slot on the computer motherboard and draw its
power from there. This will also let the CPU talk to the host adapter and disks through
the system “bus”. Sometimes you may find the controller actually integrated into the
computer’s motherboard.

Types of Disk Interfaces

The predominant interfaces for disk drives are:


• Advanced Technology Attachment (ATA): This is more common in home computing,
and can support different kinds of devices, such as hard drives and DVD burners. There
is a restriction of 2 devices per cable. This is also known as a parallel ATA, since there
is a wire for each bit of information in the interface cable, making it very wide. The disk
drives that are attached to an ATA host adapter are usually called IDE drives
(Integrated Drive Electronics).
• Serial ATA (SATA): Is an enhancement to ATA, which allows for changing drives
without shutting down (hot swap), faster transfer speeds, and thinner cabling.
Generally disks that attach either through ATA or SATA, have their disk platters
spinning at a constant speed of 7,200 revolutions per minute (RPM). Remember that
the disk spin speed is one important measure of the disk‟s access time. The other
common interfaces are:
• Small Computer System Interface (SCSI): An interface standard that is not compatible
with ATA or IDE drives. Modern versions of SCSI affords up to 16 devices per cable
including the host adapter. Although the layout looks like ATA, none of the components
are interchangeable.
• Serially Attached SCSI (SAS): A point-to-point serial protocol that replaces the parallel
SCSI bus technology mentioned above. It uses the standard SCSI command set, but is
currently not faster than parallel SCSI. In the future, speeds are expected to double, and
there will also be the ability to use certain (slower) SATA drives on a SAS bus.

SCSI disks usually spin at 10,000 or 15,000 RPM. Because of this, and the more
complicated electronics, SCSI components are much more expensive than S/ATA.
However, SCSI disks are renowned for their speed of access, and data transfer.

Abstraction & Storage Networks


Fault Tolerance and RAID

Because disk drives are sophisticated mechanical devices, when they fail they tend to
take all the data with them. RAID defines several types of redundancy and efficiency
enhancements by clustering commonly available disks. For example:
• RAID 0: Striped set no parity. Striping is where each successive block of information is
written to alternate disks in the array. RAID 0 still suffers from a single disk failure in the
array, but is often used to get the increased read-speed. The increase in read-speed
comes from being able to simultaneously move the disk read/write heads for the
different drives containing the sequential block to be read. Write speeds may also
improve, since each sequential blocks can be written at the same time to the different
disks in the array.
• RAID 1: Mirroring, no parity. Mirroring is where each block is duplicated across all
disks in the array. Here, any one disk failure will not impact data integrity. Better read
speeds are achieved by using the drive whose read/write head is closest to the track
containing the block to be read. There is generally no improvement in write speeds.
• RAID 5: Striped set with distributed parity. The advantage here is that the data from
one drive can be rebuilt with the parity information contained on the other drives. RAID
5 can only afford 1 drive to fail.
These are the most common RAID levels, but there are other RAID levels, and indeed
combinations of levels that can be configured. http://en.wikipedia.org/wiki/RAID RAID
can be implemented in the host controller, or built into the operating system. Either way,
with RAID we are beginning to see an abstraction of a physical disk into a logical one.
For example, with RAID 1, if we decided to use 2 identical 100GBdisks for mirroring,
this would ultimately end up as a 100GB (not 200GB) logical disk to the OS. So:
• Traditionally disk drives are either called Physical Volumes (PV) or Logical Volumes
(LV), depending on where, in the infrastructure, you’re talking about.
• A PV can be split up into partitions, where each partition can also look, to the
operating system, like an individual PV.
• A LUN (logical unit number) comes from the SCSI protocol, but more generally refers
to an LV in storage terminology.
• On some systems, Physical Volumes can be pooled into Volume Groups (VG), from
which Logical Volumes can be created. In this case a Logical Volume may stretch
across many different sizes and types of Physical Disks, and take advantage of RAID.
In a Linux system, this software management of disk storage is called the Logical
Volume Manager (LVM).

Directly Attached Storage (DAS)

It didn’t take long to see the appearance of disk cabinets – or Disk Arrays, connected to
servers via an external SCSI cable, that were separately managed. In some cases
these storage cabinets could be connected to multiple servers, so they could share the
storage (perhaps for fault tolerance). Also, being able to „hot swap‟ failed disks and
have the unit rebuild that disk from parity on the other disks was an expected feature.
This lead to the acronym DAS, or Directly Attached Storage (actually the acronym was
coined in more recent times to distinguish it from other technologies). The main
technologies used with DAS are SCSI with a specialized Host Bus Adapters (HBA)
installed in the servers. (More on HBAs later). A DAS afforded multiple server-access
(up to 4), for clustering but the main disadvantage was that DAS ended up yielding an
island of information.
Storage Networking Protocols

Since the length of a SCSI cable is very limited, there came a need for low-level access
to storage over networks. In effect, the equivalent of stretching the permissible distance
of the SCSI cable to much larger distances. This led to the advancements in Storage
Networking Protocols. These protocols are the same block-level SCSI commands that
go over the interface cables of a disk, and have no knowledge of how clusters of blocks
are aggregated (or used) by the OS to give us a file system. This gives us a network of
disk appliances, where each appliance is a fault-tolerant disk array with its own
management interface. The two predominant networking protocols used for Storage
Networks are the Fibre Channel Protocol (FCP) and iSCSI (over Gigabit Ethernet). In
these cases, both the Fibre Channel and the Gigabit Ethernet infrastructures are used
to carry SCSI commands over the network. iSCSI uses TCP/IP, whereas FCP has its
own 5-layer stack definition.

Gigabit Interface Converter (GBIC)

Often, in the physical implementation, port connections are made through a Gigabit
Interface Converter (GBIC). A GBIC is a standard for transceivers, commonly used with
Gigabit Ethernet and Fibre Channel (explained below). By offering a standard, hot
swappable electrical interface, a one gigabit Ethernet port, for example, can support a
wide range of physical media, from copper to long-wave single-mode optical fibre, at
lengths of hundreds of kilometres.

Fibre Channel
Fibre Channel was originally designed to support fiber optic cabling only. When copper
support was added, the Fibre Channel committee decided to keep the name in principle,
but to use the UK English spelling (Fibre) when referring to the standard. Fibre Channel
can use either optical fiber (for distance) or copper cable links (for short distance at low
cost). However, fiber-optic cables enjoys a major advantage in noise immunity

Fibre Channel Infrastructure

Fibre Channel, or FC, is a gigabit-speed network technology primarily used for storage
networking, using Fibre Optics. There are 3 topologies that can be used:
• Point-to-Point (FC-P2P). Two devices are connected back to back. This is the
simplest topology, with limited connectivity.
• Arbitrated loop (FC-AL). In this design, all devices are in a loop or ring, similar to
token ring networking. Adding or removing a device from the loop causes all activity on
the loop to be interrupted. The failure of one device causes a break in the ring. Fibre
Channel hubs exist to connect multiple devices together and may bypass failed ports. A
loop may also be made by cabling each port to the next in a ring. A minimal loop
containing only two ports, while appearing to be similar to FC-P2P, differs considerably
in terms of the protocol.
• Switched fabric (FC-SW). All devices or loops of devices are connected to Fibre
Channel switches, similar conceptually to modern Ethernet implementations. The
switches manage the state of the fabric, providing optimized interconnections.

FC-SW is the most flexible topology, enabling all servers and storage devices to
communicate with each other. It also provides for failover architecture if a server or disk
array fails. FC-SW involves one or more intelligent switches, each providing multiple
ports for nodes. Unlike FC-AL, FC-SW bandwidth is fully scalable, i.e. there can be any
number of 8Gbps (Gigabits per second) transfers operating simultaneously through the
switch. In fact, if using full-duplex, each connection between a node and a switch port
can use 16Gbps bandwidth. Because switches can be cascaded and interwoven, the
resultant connection cloud has been called the fabric.
Fibre Channel Host Bus Adapters

Fibre Channel HBAs are available for all major open systems, computer architectures,
and buses. Some are OS dependent. Each HBA has a unique Worldwide Name (WWN,
or WWID for Worldwide Identifier), which is similar to an Ethernet MAC address in that it
uses an Organizationally Unique Identifier (OUI) assigned by the IEEE. However,
WWNs are longer (8 bytes). There are two types of WWNs on an HBA: a node WWN
(WWNN), which is shared by all ports on a host bus adapter, and a port WWN (WWPN),
which is unique to each port. Some Fibre Channel HBA manufacturers are Emulex, LSI,
QLogic and ATTO Technology.

Fibre Ports
The basic building block of the Fibre Channel is the port:
N_Port: This is a node port that is not loop capable. It is used to connect an equipment
port to the fabric.
NL_Port: This is a node port that is loop capable. It is used to connect an equipment
port to the fabric in a loop configuration through an L_Port or FL_Port.
FL_Port: This is a fabric port that is loop capable. It is used to connect an NL_Port to
the switch in a public loop configuration.
L_Port: This is a loop-capable node or switch port.
E_Port: This is an expansion port. A port is designated an E_Port when it is used as an
inter-switch expansion port (ISL) to connect to the E_Port of another switch, to enlarge
the switch fabric.
F_Port: This is a fabric port that is not loop capable. It is used to connect an N_Port
point-point to a switch.
G_Port: This is a generic port that can operate as either an E_Port or an F_Port. A port
is defined as a G_Port after it is connected but has not received a response to loop
initialization or has not yet completed the link initialization procedure with the adjacent
Fibre Channel device.
U_Port: This is a universal port—a more generic switch port than a G_Port. It can
operate as either an E_Port, F_Port, or FL_Port. A port is defined as a U_Port when it is
not connected or has not yet assumed a specific function in the fabric.
MTx_Port: CNT port used as a mirror for viewing the transmit stream of the port to be
diagnosed.
MRx_Port: CNT port used as a mirror for viewing the receive stream of the port to be
diagnosed.
SD_Port: Cisco SPAN port used for mirroring another port for diagnostic purposes.

Fibre Channel Zoning


Zoning allows for finer segmentation of the Fibre Channel fabric. Zoning can be used to
instigate a barrier between different environments. Only the members of the same zone
can communicate within that zone and all other attempts from outside are rejected.
Zoning could be used for:
Separating LUNs between Windows and other operating systems to avoid data
corruption
Security
Test & maintenance
Managing different user groups and objectives

Zoning can be implemented in one of two ways:


Hardware: Hardware zoning is based on the physical fabric port number. The members
of a zone are physical ports on the fabric switch. It can be implemented in the following
configurations:
One-to-one
One-to-many
Many-to-many
Software: Software zoning is implemented by the fabric operating systems within the
fabric switches. They are almost always implemented by a combination of the name
server and the Fibre Channel Protocol. When a port contacts the name server, the
name server will only reply with information about ports in the same zone as the
requesting port. A soft zone, or software zone, is not enforced by hardware (i.e.
hardware zoning). Usually, the zoning software also allows you to create symbolic
names for the zone members and for the zones themselves. Dealing with the symbolic
name or aliases for a device is often easier than trying to use the WWN address.

iSCSI
iSCSI over Gigabit Ethernet
Ethernet has evolved into the most widely implemented physical and link layer protocol
today. Fast Ethernet increased speed from 10 to 100 megabits per second (Mbit/s).
Gigabit Ethernet was the next iteration, increasing the speed to 1000 Mbit/s. In the
marketplace full-duplex with switches is the norm. There are four different physical layer
standards for gigabit Ethernet:
Optical fiber (1000BASE-X)
Twisted pair cable (1000BASE-T)
Balanced copper cable (1000BASE-CX).

iSCSI (RFC3720) is a mapping of the regular SCSI protocol over TCP/IP, more
commonly over Gigabit Ethernet. Unlike Fibre Channel, which requires special-purpose
cabling, iSCSI can be run over long distances using an existing network infrastructure.
TCP/IP uses a client/server model, but iSCSI uses the terms initiator (for the data
consumer) and target (for the LUN).
• A Software initiator: Uses code to implement iSCSI, typically as a device driver.
• A hardware initiator mitigates the overhead of iSCSI, TCP processing and Ethernet
interrupts, and therefore may improve the performance of servers that use iSCSI. An
iSCSI host bus adapter (HBA) implements a hardware initiator and is typically packaged
as a combination of a Gigabit Ethernet NIC, some kind of TCP/IP offload technology
(TOE) and a SCSI bus adapter (controller), which is how it appears to the operating
system.

iSCSI Naming & Addressing


Each initiator or target is known by an iSCSI Name which is independent of the location
of the initiator and target. iSCSI Names are used to provide:
An initiator identifier for configurations that provide multiple initiators behind a single IP
address.
A target identifier for configurations that present multiple targets behind a single IP
address and port.
A method to recognize multiple paths to the same device on different IP addresses and
ports.
An identifier for source and destination targets for use in third-party commands.
An identifier for initiators and targets to enable them to recognize each other regardless
of IP address and port mapping on intermediary firewalls.

The initiator presents both its iSCSI Initiator Name and the iSCSI Target Name to which
it wishes to connect in the first login request of a new session. The only exception is if a
discovery session is to be established; the iSCSI Initiator Name is still required, but the
SCSI Target Name may be ignored. The default name "iSCSI" is reserved and is not
used as an individual initiator or target name. iSCSI Names do not require special
handling within the iSCSI layer; they are opaque and case-sensitive for purposes of
comparison. iSCSI provides three name-formats:
iSCSI Qualified Name (IQN), format: iqn.yyyy-mm.{reversed domain name}
o iqn.2001-04.com.acme:storage.tape.sys1.xyz
o iqn.1998-03.com.disk-vendor.diskarrays.sn.45678
o iqn.2000-01.com.gateways.yourtargets.24
o iqn.1987-06.com.os-vendor.plan9.cdrom.12345
o iqn.2001-03.com.service-provider.users.customer235.host90
Extended Unique Identifier (EUI), format: eui.{EUI-64 bit address}
o eui.02004567A425678D
T11 Network Address Authority (NAA), format: naa.{NAA 64 or 128 bit identifier}
o naa.52004567BA64678D

IQN format addresses occur most commonly, and are qualified by a date (yyyy-mm)
because domain names can expire or be acquired by another entity. iSCSI nodes (i.e.
the machine that contains the LUN targets) also have addresses. An iSCSI address
specifies a single path to an iSCSI node and has the following format: <domain-
name>[:<port>] Where <domain-name> can be either an IP address, in dotted decimal
notation or a Fully Qualified Domain Name (FQDN or host name). If the <port> is not
specified, the default port 3260 will be assumed.
iSCSI Security
To ensure that only valid initiators connect to storage arrays, administrators most
commonly run iSCSI only over logically-isolated backchannel networks.
For authentication, iSCSI initiators and targets prove their identity to each other using
the CHAP protocol, which includes a mechanism to prevent cleartext passwords from
appearing on the wire. Additionally, as with all IP-based protocols, IPsec can operate at
the network layer. Though the iSCSI negotiation protocol is designed to accommodate
other authentication schemes, interoperability issues limit their deployment. An initiator
authenticates not to the storage array, but to the specific storage asset (target) it intends
to use. For authorization, iSCSI deployments require strategies to prevent unrelated
initiators from accessing storage resources. Typically, iSCSI storage arrays explicitly
map initiators to specific target LUNs.

iSCSI Zoning
Though there really isn‟t a zoning protocol associated with iSCSI, VLANs can be
leveraged to accomplish the segregation needed. http://en.wikipedia.org/wiki/Vlan

Storage Architectures
DAS, NAS & SAN

The emergence of these Storage Networking Protocols has led to the development of
different types of storage architectures, depending on the needs:
• We talked earlier about the Directly Attached Storage (DAS).
• Network Attached Storage (NAS): First conceived by Novell, but more commonly used
in MS LAN Manager (CIFS), and NFS (predominant in the UNIX/Linux worlds), all serve
up file shares. These days it‟s more common to see a NAS appliance, which is
essentially a self-contained computer connected to a network, with the sole purpose of
supplying file-based data storage services to other devices on the network. Due to its
multiprotocol nature, and the reduced CPU and OS layer, a NAS appliance – as such –
has its limitations compared to the FC/GbE systems. This is known as file-level storage.
• Storage Area Network (SAN) is an architecture to attach remote storage devices (such
as disk arrays, tape libraries and optical jukeboxes) to servers in such a way that, to the
OS, the devices appear as locally attached. That is, the storage acts to the OS like it
was attached with an interface cable to a locally installed host adapter. This is known as
block-level storage.

Interestingly, Auspex Systems was one of the first to develop a dedicated NFS
appliance for use in the UNIX market. A group of Auspex engineers split away in the
early 1990s to create the integrated NetApp filer, which supported both CIFS for
Windows and NFS for UNIX, and had superior scalability and ease of deployment. This
started the market for proprietary NAS devices.
Hybrid
What if the NAS uses the SAN for storage? A NAS head refers to a NAS which does not
have any on-board storage, but instead connects to a SAN. In effect, it acts as a
translator between the file-level NAS protocols (NFS, CIFS, etc.) and the block-level
SAN protocols (Fibre Channel Protocol, iSCSI). Thus it can combine the advantages of
both technologies.

Tiered storage
Tiered storage is a data storage environment consisting of two or more kinds of storage
delineated by differences in at least one of these four attributes: Price, performance,
capacity and function. In mature implementations, the storage architecture is split into
different tiers. Each tier differs in the:
Type of hardware used
Performance of the hardware
Scale factor of that tier (amount of storage available)
Availability of the tier and policies at that tier

A very common model is to have a primary tier with expensive, high performance and
limited storage. Secondary tiers typically comprise of less expensive storage media and
disks and can either host data migrated (or staged) by Lifecycle Management software
from the primary tier or can host data directly saved on the secondary tier by the
application servers and workstations if those storage clients did not warrant primary tier
access. Both tiers are typically serviced by a backup tier where data is copied into long-
term and offsite storage. In this context, you may hear two terms:
• ILM – Information Lifecycle Management refers to a wide-ranging set of strategies for
administering storage systems on computing devices.
• HSM – Hierarchical Storage Management is a data storage technique which
automatically moves data between high-cost and low-cost storage media. HSM systems
exist because high-speed storage devices, such as hard disk drive arrays, are more
expensive (per byte stored) than slower devices, such as optical discs and magnetic
tape drives. http://en.wikipedia.org/wiki/Hierarchical_storage_management

Storage Admission Tier (SAT)


The goal of Storage virtualization is to turn multiple disk arrays, made by different
vendors, scattered over the network, into a single monolithic storage device, which can
be managed uniformly. The Storage Admission Tier (SAT) is a tier put in front of the
primary tier, as the way into the storage. This affords a way to manage access and
policies in a way that can virtualize the storage. SAT should conform to the „Virtualize,
Optimize & Manage‟ paradigm (VOM):
• Virtualize: At the SAN layer, the amalgamation of multiple storage devices as one
single storage unit greatly simplifies management of storage hardware resource
allocation. At the NAS layer, the same degree of virtualization is needed to make
multiple heterogeneous file server shares appear as at a more logical level, abstracting
the NAS implementations from the application tier.
• Optimize: Can include things like compression, data de-duplication and organizational
decisions of data placement
• Management: To control policies, security and access control (including rights
management) from the entry and exit point of the data to and from the storage network.

File Area Networks (FAN)


The combination of the Storage Access Tier (SAT), the Tiered Storage Model and
NAS/SAN are known as the File Area Network (FAN). As of this writing, the concept of
FAN cannot be seen in any mainstream products, but the concept is introduced for
completeness.

Network Storage Fault-tolerance


SAN Multipath I/O

Multipath I/O is a fault-tolerance and performance enhancement technique whereby


there is more than one physical path between a computer system and its mass storage
devices through the buses, controllers, switches, and bridge devices connecting them.
17
In a well-designed SAN, it is likely that you will want a device to be accessed by the
host application over more than one path in order to potentially obtain better
performance, and to facilitate recovery in the case of adapter, cable, switch, or GBIC
failure. Should one controller, port or switch fail, the server‟s OS can route I/O through
the remaining controller transparently to the application, with no changes visible to the
applications, other than perhaps incremental latency. But, the same logical volume
within a storage device (LUN) may be presented many times to the server through each
of the possible paths to that LUN. In order to avoid this and make the device easier to
administrate and to eliminate confusion, multipathing software is needed. This is
responsible for making each LUN visible only once from the application and OS point of
view. In addition to this, the multipathing software is also responsible for failover
recovery and load balancing:
• Failover recovery: In a case of the malfunction of a component involved in making the
LUN connection, the multipathing software redirects all the data traffic onto other
available paths.
• Load balancing: The multipathing software is able to balance the data traffic equitably
over the available paths from the hosts to the LUNs.

There are different kinds of multipathing software available from different vendors.

Storage Replication

Depending on the details behind how the particular replication works, the application
layer may or may not be involved. If blocks are replicated without the knowledge of file
systems or applications built on top of the blocks being replicated, when recovering
using these blocks, the file system may be in an inconsistent state.
• A “Restartable” recovery implies that the application layer has full knowledge of the
replication, and so the replicated blocks that represent the applications are in a
consistent state. This means that the application layer (and possibly OS) had a chance
to „quiesce‟ before the replication cycle.
• A “Recoverable” recovery implies that some extra work needs to be done to the
replicated data before it can be useful in a recovery situation.

RPO & RTO


For replication planning, there are two important numbers to consider:
Recovery Point Objective (RPO) describes the acceptable amount of data loss
measured in time. For example: Assume that the RPO is 2-hours. If there is a complete
replication at 10:00am and the system dies at 11:59am without a new replication, the
loss of the data written between 10:00am and 11:59am will not be recovered from the
replica. This amount of time data has been lost has been deemed acceptable because
of the 2 hour RPO. This is the case even if it takes an additional 3 hours to get the site
back into production. The production will continue from the point in time of 10:00am. All
data in between will have to be manually recovered through other means.
The Recovery Time Objective (RTO) is the duration of time and a service level within
which a business process must be restored after a disaster in order to avoid
unacceptable consequences associated with a break in business continuity. The RTO
attaches to the business process and not the resources required to support the process.

Snapshots

Even though snapshots where talked about in the context of replication, snapshots have
their uses on the local systems as well. Typically a snapshot is not a copy, since that
would take too long, but it’s a, freezing of all the blocks in a LUN making them read-only
at that point in time. Any logical block that needs to be updated, is allocated a new
physical block, thus preserving the original snapshot blocks as a backup. Any new
blocks are what take up new space, and are allocated for the writes after the snapshot
took place. Allocating space in this manner can take substantially less space than taking
a whole copy. Deleting of a snapshot can be done in the background, essentially freeing
any blocks that have been updated since the snapshot.
Snapshotting can be implemented in the management tools of the storage array, or built
into the OS. As with RAID, the advantage of building this functionality at the block-level
is that it can be abstracted from the file systems that are built on top of the blocks. Being
at this low level also has a drawback, in that when the snapshot is taken, the file
systems (and hence applications) may not be in a consistent state. There is usually a
need to, quiesce the running machine (virtual or otherwise) before a snapshot is made.
This implies that all levels (up to the application) should be aware that they reside on a
snapshot-capable system.

Terminology
Thin Provisioning & Over-Allocation
[Thin provisioning is called sparse volumes in some contexts] In a storage consolidation
environment, where many applications are sharing access to the same storage array,
thin provisioning allows administrators to maintain a single free space buffer pool to
service the data growth requirements of all applications. This avoids the poor utilization
rates, often as low as 10%, that occur on traditional storage arrays where large pools of
storage capacity are allocated to individual applications, but remain unused (i.e. not
written to). This traditional model is often called fat provisioning. On the other hand,
over-allocation or over-subscription is a mechanism that allows server applications to be
allocated more storage capacity than has been physically reserved on the storage array
itself. This allows flexibility in growth and shrinkage of application storage volumes,
without having to predict accurately how much a volume will grow or contract. Physical
storage capacity on the array is only dedicated when data is actually written by the
application, not when the storage volume is initially allocated.
LUN Masking
Logical Unit Number Masking or LUN masking is an authorization process that makes a
Logical Unit Number available to some hosts and unavailable to other hosts. The
security benefits are limited in that with many HBAs it is possible to forge source
addresses (WWNs/MACs/IPs). However, it is mainly implemented not as a security
measure per se, but rather as protection against misbehaving servers from corrupting
disks belonging to other servers. For example, Windows servers attached to a SAN will
under some conditions corrupt non-Windows (Unix, Linux, NetWare) volumes on the
SAN by attempting to write Windows volume labels to them. By hiding the other LUNs
from the Windows server, this can be prevented, since the Windows server does not
even realize the other LUNs exist. (http://en.wikipedia.org/wiki/LUN_masking)
Data De-duplication
This is an advanced form of data compression. Data de-duplication software as an
appliance, offered separately or as a feature in another storage product, provides file,
block, or sub-block-level elimination of duplicate data by storing pointers to a single
copy of the data item. This concept is sometimes referred to as data redundancy
elimination or single instance store. The effects of de-duplication primarily involve the
improved cost structure of disk-based solutions. As a result, businesses may be able to
use disks for more of their backup operations and be able to retain data on disks for
longer periods of times, enabling restoration from disks.

You might also like