Storage: Ratnadeep Bhattacharya
Storage: Ratnadeep Bhattacharya
Ratnadeep Bhattacharya
Module 1
INTRODUCTION TO STORAGE
Storage only means an accumulation of devices
to store electrical bits that constitute data.
The storage subsystem can consist of:
- Hard drives (Random access devices)
- Tape drives (Sequential access devices)
- Optical or magnetic drives (CD/DVD/Floppy)
- Memory (Random access device)
Hard Drives
• Data is written to and read from in a random manner. Actual
structure of the disks are cylindrical with tracks defined on
the circumference. A spindle runs along these tracks
(sequentially) to read/write data.
• Supports logical addressing scheme, implemented with the
help of file systems.
• File systems simply bundle assign logical addresses to physical
blocks and make them appear as sequential to the external
world while the underlying physical formatting is random.
• Partition tables define the disk geography and the block
structure of the file system.
Tape Drives
• Mostly non-intelligent or semi-intelligent systems.
Basically this means that though some devices in
this class might let you do a few operations (mostly
indexing) while the data is still in the drive, data
access is not allowed as no file system structure is
defined.
• The format in which the data is written is not
recognised mostly. Though in some tape drives you
can run commands to identify the date of the
backup, kind of data held and so on.
Optical and magnetic media
• CD/DVD drives use optical
technology(something I am not aware of) to
save data while achieving greater compression
ratios.
• Floppy drives store data in the same manner
but magnetically. Compression ratios are
much lesser.
• Basic principle of storing is almost same.
Memory
• A very important unit in both processing and transmission
of information.
• Though not generally seen as a storage unit, memory has a
very important role in the utilisation of processor speed.
This has compelled many a tweaking in how memory
modules handle data. A basic understanding of memory
will help reduce adverse effects to data such as loss or
corruption.
• The system memory was introduced to hold data closer to
the processor chip to enable faster access by the core to
data.
Memory (ctd.)
• This concept was later re-introduced in the form of
caches to store data in different parts of its travel from
the disk to the processor along the system bus.
• Lately, three layers of caches have been introduced to
the processor chips with duplicate data along with
some unique addressing system (like TAG-RAM s) and
data access techniques for faster access.
• From a storage point of view, we are most concerned
with caches found on RAID cards, HBAs and controllers.
Storage today
• Today we look at storage in a very different manner – with
awe; with fear; and of course with excitement.
• Different storage devices can be held at remote locations
and accessed by operating system just like local disks. This
also has introduced an awesome array of technologies in
the storage field.
• The key words are:
– Speed
– Availability
– Manageability
– Redundancy
Transmission technologies of today
• Main players in this arena are:
– Fiber Channel
– iSCSI
• Fiber Channel wins in speed and security by using laser to transmit data using
protocols like FCP (local), FCIP and iFCP (remote using TCP/IP).
• iSCSI wins in costs, familiarity and distance.
• Though FC is touted to be lossless. That is not entirely correct. Just as we
have loss due to impedance and crosstalk created by magnetic flux in
electrical lines, we also have loss in FC due to reflection, refraction and
deflection in optical lines.
• The above point actually seriously hampers the ability of the FCP protocol to
carry data over long distances.
• We can transmit FC packets with any reliability only 50-100 KM and that too
only by using CWDM/DWDM technologies over dark fiber.
Parallel data transmission protocols
SCSI ARCHITECTURE
The SCSI family
• The SCSI family is mainly recognised by their corresponding
ANSI revision numbers.
• SCSI started with a speed of 5 Mb/sec. Today SCSI is
capable of 640 Mb/sec while SAS is dealing with 3 and 6
Gb/sec speeds.
• ANSI versions for SCSI:
– ANSI 1: SCSI 1
– ANSI 2: SCSI 2
– ANSI 3: SCSI 3
– ANSI 4: SCSI 3 SPC 2
– ANSI 5: SCSI 3 SPC 3
SCSI diagram
Initiator Terminator
SCSI bus
SCSI architecture
• A SCSI channel always begins at the initiator and ends at the
terminator.
• Traditionally, there is only eight IDs on a SCSI bus – 0 to 7.
• The initiator always has the SCSI ID of 7.
• The terminator does not have an ID. It’s sole purpose in life is to
terminate any electrical signal that manages to reach the end of
the bus. This ensures that the signal does not reflect (the
similarities between light and electricity are endless) back into the
bus creating noise and distortion.
• A terminator can be passive (a 50 ohm resistance) or active (an IC
chip).
• The rest of the IDs are open for client devices.
SCSI Bus phases
The most important SCSI bus phases are:
• Bus free – BSY and SEL signals are simultaneously false.
• Arbitration – the BSY signal and the SCSI ID of the device is
raised on the bus by the target. If no other device raises a
higher SCSI ID on the bus then the requesting device gains
control of the bus effectively setting up an I_T nexus.
• Selection – this simply means that there is some command or
data transfer operation going on in the bus.
• Reselection – allow a target to re-establish a connection to
the initiator which was previously initiated by the initiator but
suspended by the target.
SCSI flavours
• Fast SCSI – can process 10 million operations
per second. Has a width of 8 devices.
• Wide SCSI – can process 5 million operations
per second. Has a width of 16 or 32 devices.
Generally a width of 16 is used.
• Fast and Wide SCSI – combines the above two.
The SCSI look
• SCSI commands are grouped into blocks called
the Command Data Blocks (CDB).
• A CDB has the following:
– A control byte
– An op-code
– The LUN ID (optional)
– Any command parameters if required
Structure of the op-code
• The op-code is always the first byte of the
CDB.
– Bits 0 to 4 indicate which group the command
belongs to.
– Bits 5 to 7 indicate the actual command.
The SCSI look
• SCSI commands are grouped into blocks called
the Command Data Blocks (CDB).
• A CDB has the following:
– A control byte
– An op-code
– The LUN ID (optional)
– Any command parameters if required
Structure of the op-code
• The op-code is always the first byte of the
CDB.
– Bits 0 to 4 indicate which group the command
belongs to.
– Bits 5 to 7 indicate the actual command.
Op-code groups
There are 8 op-code groups in all:
1. Group 0 – six byte commands.
2. Group 1 – ten byte commands.
3. Group 2 – also ten byte commands..
4. Group 3 – reserved.
5. Group 4 – sixteen byte commands.
6. Group 5 – twelve byte commands.
7. Group 6 and Group 7 are for vendor specific
commands.
Module 3
FIBER CHANNEL
Fiber Channel
This is a serial bus architecture.
Each SCSI bus phase is considered a sequence by FC and
broken into 2K chunks.
Any SCSI operation is an ‘Exchange’. Exchanges are broken
into ‘Sequences’ and sequences into ‘Frames’.
SCSI packets from the host are sent to the HBA/SP ports,
which have the GLM/GBIC installed in them. The GBIC is a
special device placed on the transceiver on the port to
generate the SONET/SDH packets (encapsulating the SCSI
packet) using an ITU standard called the Generic Framing
Procedure (GFP).
FC packets
• Start of frame (4 bytes)
• Frame header (24 bytes)
• Data field (2112 bytes)
• CRC error check (4 bytes)
• End of frame (4 bytes)
FC header
• CTL (control information)
• Source address
• Destination address
• Type
• Seq_cnt
• Seq_ID
• Exchange_ID
FC ports
• N_Port – node port. Can be an HBA/SP port.
• NL_Port – arbitrated loop port. When arbitrated loop topology
instead of switched topology is used.
• F_Port – fabric port. Generally stands for a switch port.
• FL_Port – fabric port with arbitrated loop capabilities. Ports on
a switch used to integrate an arbitrated loop topology.
• G_Port – generic port. Always on the switch. Can be an E_Port
or an F_Port.
• E_Port – extension port. On the switch. Interconnects switches.
• TE_Port – trunked extension port. Again on the port. Similar to
link aggregation.
Class of Service
• This defines delivery options for frame
transmission.
• They define connection, in-order delivery and
confirmation of delivery or non-delivery of
frames.
• There are three classes- 1,2 & 3.
• SCSI over FC uses 3.
• Error recovery is passed completely to the SCSI
layers.
Zoning
• Zoning is used to present LUNs to servers in a secured way
over a switched network.
• Each node has a WWNN and each port has a WWPN.
These numbers are used to create logical connection maps
and hold them on an ‘active’ configuration file. Only one
configuration can be active at any time, albeit multiple
zones can exist within or without the configuration file.
• Zoning using WWNs is called soft zoning. In hard zoning,
ports on a logical connection maps are created between
ports on the same switch.
Register State Change Notification
• Information about all zones (created on the switch)
and all devices (connected to the switch) are held on
an FC database inside the switch.
• Whenever there is any change on this database,
notifications are sent to the devices attached via
RSCN (given that the device attached supports RSCN).
• There are two types of notifications:
– A node event: when a node port generates an event.
– A fabric event: when the switch generates an event.
FCIP (Fiber Channel over IP)
• Connects multi-site FC-SANs over an TCP/IP link as a logical SAN.
• SCSI packets are encapsulated inside FC packets and then FC packets are
encapsulated inside IP datagram.
• The tunnels behave similar to ISLs and can be used for trunking and load-
balancing in the same manner.
• A disruption in the IP network also affects the local FC network
temporarily and generates RSCNs.
• All the SAN islands appear as one as they start using a common and
shared namespace.
• The major drawback of this protocol is that there is no way to delink the
IP network from the FC network.
• Generally, the IP tunnel is completely transparent but low-level FC-AL
signals cannot traverse the link.
FCIP ctd.
• The frames employed over the IP tunnels to
establish connection are called FSF – FCIP
Special Frames.
• The receiver verifies the packet and if it is
acceptable then echoes the same back to the
tunnel initiator in an unmodified format.
• Then the initiator verifies the packet, after
which transmission can start.
FSF frames
• FC identifier of tunnel initiator.
• FC endpoint identifier of tunnel initiator.
• FC identifier of the intended destination.
• A 64 bit random number to uniquely identify
the FSF.
iFCP (Internet Fiber Channel Protocol)
ISCSI OVERVIEW
iSCSI features
• Same old serial SCSI architecture.
• First allowing block device access over an electrical network.
• Requires Gigabit networks.
• iSCSI over WAN can have latency issues.
• TOE cards save CPU overhead.
• The same old three-way (SYN – SYN/ACK – ACK) handshake used.
• TCP takes care of ordering and retry issues.
• Uses DNS and SLP services. Can be configured to use iSNS as well.
• Removes distance limitations inherent in FC.
• Encapsulates SCSI CDBs inside TCP packets turning them into iSCSI
PDUs.
• Automatic target discovery is handled by the ‘SendTarget’ command.
iSCSI functional overview
The iSCSI target ports form something called a target port group. Each port has
the port group tag attached to it.
An iSCSI login happens in the following manner:
• First the initiator port creates a SCSI port ID: the iqn number, i, hex of the
ISID.
• Then (in case of automatic discovery) it sends the SendTarget command out
to the target. The target replies with all the iSCSI portals associated to that
target port group.
• The target forms the SCSI port with: the iqn number, t, the hex code for the
target port group tag. This is the reason that even a single target port needs
to form a target group.
• The target identifies each session with a number. During the initiator login
this number is always 0. After the login the target identifies the session with
an unique ID.
iSCSI PDU (Protocol Data Unit)
• PHY header
• IP header
• TCP header
• iSCSI BHS
• CDB
• Data
iSCSI identifiers
• iSCSI names – iSCSI nodes have globally unique names.
The iqn, eui format and any aliases are supported by
ESX.
• ISID – iSCSI session ID. TCP relationship between
initiator and target.
• CID – iSCSI connection identifiers. An iSCSI session
may have several logical connections. They aggregate
bandwidth and provide load balancing.
• iSCSI portals – combination of the IP address of
initiator/target and the port number.
Module 5