Nondisruptive Operations For Netapp Ontap 9.0 V1.3-Lab Guide
Nondisruptive Operations For Netapp Ontap 9.0 V1.3-Lab Guide
ONTAP 9.0
1 Introduction...................................................................................................................................... 3
1.3 Prerequisites.............................................................................................................................. 4
2 Lab Environment............................................................................................................................. 5
3 Lab Activities................................................................................................................................... 7
4 Appendix 1..................................................................................................................................... 37
5 References......................................................................................................................................41
6 Version History.............................................................................................................................. 42
1.3 Prerequisites
This lab assumes that you are familiar with the concepts introduced in the Basic Concepts for NetApp ONTAP 9.0
lab.
This lab also assumes that you know how to use PuTTY, and how to launch and log in to System Manager to
manage a cluster. If you are unfamiliar with any of those procedures then please review Appendix 1 of this lab
guide.
Figure 2-1:
All of the servers and storage controllers presented in this lab are virtual devices, and the networks that
interconnect them are exclusive to your lab session. While we encourage you to follow the demonstration steps
outlined in this lab guide, you are free to deviate from this guide and experiment with other ONTAP features that
interest you. While the virtual storage controllers (vsims) used in this lab offer nearly all of the same functionality
as physical storage controllers, they are not capable of providing the same performance as a physical controller,
which is why these labs are not suitable for performance testing.
The Lab Host Credentials table provides a list of the servers and storage controller nodes in the lab, along with
their IP address.
The Preinstalled NetApp Software table lists the NetApp software that is pre-installed on the various hosts in this
lab.
Hostname Description
Data ONTAP DSM v4.1 for Windows MPIO, Windows Unified Host Utility Kit
JUMPHOST
v7.0.0, NetApp PowerShell Toolkit v4.2.0
RHEL1, RHEL2 Linux Host Utilities Kit v7.0
Note: The table does not contain entries for iSCSI because that protocol utilizes multipathing and ALUA
to protect against path failure, so LIF failover is not applicable.
Please only select one entry at a time for the LIF migration test in order to maintain optimum simulator
performance. Once you complete the test for one entry you are welcome to come back and test another.
Note: The “File” column of the table contains information you will be using later in this lab section.
1. You will be generating I/O load to the volume for your selected entry using the sio command, and you
will run sio from the client listed for your chosen entry from the LIF Migrate Exercise Choices table.
• If the listed client is rhel1, then you will need to open a PuTTY session to rhel1. The PuTTY
launch icon is on the taskbar of jumphost. Log in to rhel1 using the username root and the
password Netapp1!.
Figure 3-1:
Figure 3-2:
2. In the appropriate client CLI window for your exercise choice, launch sio using the following syntax:
sio 0 0 4k 0 50m 600 2 <File> -create,
where <File> is the “File” value that comes from LIF Migrate Exercise Choices table. This command will
initiate a sequential write operation to <File> for a period of 10 minutes, which should be sufficient time
to complete this LIF Migration exercise.
If you opted to use CIFS for your test, then the sio invocation would look like this:
Windows PowerShell
Copyright (C) 2013 Microsoft Corporation. All rights reserved.
PS C:\Users\Administrator.DEMO> sio 0 0 4k 0 50m 600 2 S:\engineering\cifs.sio -create
Version: 6.39
Read: 0 Rand: 0 BlkSz: 4096 BegnBlk: 0 EndBlk: 12800 Secs: 600 Threads: 2 Devs: 1 S:
\engineering\cifs.sio
The sio command will continue running until the specified duration is reached, or it encounters an error,
in which case it will generate an error message. If the sio command is disrupted by the LIF migration,
then you will see an error message in the sio output.
The remaining screenshots and CLI examples in this section show the steps required to complete the
Windows option in the LIF Migration exercise Choices table. If you chose a different entry you will need
to substitute your entry’s values where appropriate.
3. On the desktop of Jumphost, launch Chrome and log into System Manager. (If you need further
assistance in opening System Manager then see Appendix 1.)
Figure 3-3:
In System Manager, observe the current port assignments for the NAS LIFs for svm1.
4. In the command bar at the top of System Manager, select the Network tab.
5. In the “Network” pane, select the Network Interfaces tab.
6. In the list of interfaces, locate the entries for “svm1_cifs_nfs_lif1” and “svm1_cifs_nfs_lif2”, and note their
current port assignments, which should be cluster1-01:e0c and cluster1-02:e0c, respectively.
Figure 3-4:
Svm1 uses DNS load balancing for it’s NAS LIFs, so you cannot predict in advance which of those two
LIFs the host running sio will be using to send I/O to the CIFS share or NFS-mounted volume. Since
you’ve already started sio on your client, you will now need to access the clustered Data ONTAP CLI so
you can determine which LIF is handling that traffic.
7. If you do not currently have a PuTTY session open to cluster1, open one now by right-clicking on
the PuTTY icon on the taskbar, and selecting PuTTY from the context menu. Log in to cluster1 with
username admin, and the password Netapp1!.
Figure 3-5:
In the preceding command output, notice that “svm1_cifs_nfs_lif2” is carrying all the traffic, so in this
example this is the LIF that you would want to migrate. This is the LIF that the rest of the instructions in
this section will use, but remember that you will need to substitute the appropriate LIF name from your
lab where needed.
Now go back to System Manager to begin the LIF migration.
9. In the “Network” pane of System Manager, locate in the interface list the LIF you identified in the
preceding step and select it. Make note of the current port assignment for your LIF, as you will need this
information later. The LIF in this example is located on node cluster1-02 port e0c.
10. Right-click on the LIF.
11. In the context menu, select Migrate.
10
11
Figure 3-6:
Figure 3-7:
The Warning dialog closes, and is replaced by the “Migrate Interface” window.
It is in this window that you will select the node and port that you want to migrate the LIF to, a decision
you will make based on the LIF’s current node and port location. In the example shown here the LIF
“svm1_cifs_nfs_lif2” was located on node cluster-02 port e0c, in which case you would want to migrate
the LIF to node cluster1-01. Since “svm1_cifs_nfs_lif1” is already on port e0c of that node, choose the
e0d port as the target port for “svm1_cifs_nfs_lif2”.
13. Expand the ports list for your target node, cluster1-01 in this example, and select port e0d.
14. Click Migrate.
14
Figure 3-8:
Although you are not using it in this exercise, notice the Migrate Permanently checkbox in this window.
If you check this box it indicates that the LIF’s home port should also be set to this new port value.
The “Migrate Interface” window closes, and focus returns to the Networks pane in System Manager.
The LIF quickly migrates over to the new node and port, and the sio program generates no error
messages during or after the migration, indicating that it was unaffected by the operation.
15. The “Current Port” value shown for the LIF in the Network Interfaces list has changed to reflect the
nodes’ new port assignment. The small red X next to the current port entry indicates that the LIF does
not currently reside on it’s configured home port.
Figure 3-9:
Most likely you do not want to leave this LIF on this alternate node and port indefinitely. In most cases
you would perform any maintenance that required you to move the LIF in the first place, and when
finished you would move the LIF back to it’s original location. So now send the LIF back to it’s home
port.
16. Right-click on the list entry for the LIF.
17. Select Send to Home from the context menu.
17
Figure 3-10:
The LIF migrates back to it’s home port, once again without disrupting the sio utility.
18. The “Current Port” value for the LIF returns to it’s original value in the Network Interfaces list, and the
red X disappears to indicate that the LIF is back on it’s home port.
Figure 3-11:
When sio starts, it generates several lines of output, and then goes silent until it reaches the end of it’s
scheduled execution interval, at which point it outputs quite a few lines of statistical information about
the execution before exiting. If the LIF migration disrupted any of sio’s write operations to the target
file, then around the same time you will see obvious error messages in the sio output, and sio will have
terminated abnormally. You should see no such error messages during this exercise, as LIF migration
is non-disruptive. If sio is still running, you can terminate it by issuing a Ctrl-c within the Powershell
window.
Figure 3-12:
• If the listed client is jumphost, then you will need to open PowerShell (if you do not already
have one open). The PowerShell launch icon is on the taskbar of jumphost.
Figure 3-13:
2. In the appropriate client CLI window for your exercise choice, launch sio using the following syntax:
sio 0 0 4k 0 50m 600 2 <File> -create where <File> is a value that comes from the Volume
Move Exercise Choices table. This will initiate a sequential write operation to <File> for a period of 10
minutes,which should be sufficient time to complete the volume move operation.
Note: The following screenshots and CLI examples in this section show the steps required to
complete the volume move exercise for the Linux and NFS entry from the Volume Move Exercise
Choices table. If you chose to utilize a different table entry then you will need to substitute your
entry’s corresponding values where appropriate.
Launch the sio job for your client using the appropriate <File> value from the Volume Move Exercise
Choices table. The following example is for the NFS entry.
Now that sio is running, examine the volume you plan to move.
Figure 3-14:
Figure 3-15:
6. In the newly displayed pane for your chosen SVM, select the Volumes tab.
7. In the list of the SVM's Volumes, select the volume that corresponds to your chosen entry in the Volume
Move Exercise Choices Table. This example uses the “engineering” volume.
Note: This exercise assumes that your volumes initially exist on the aggregate
aggr1_cluster1_01, and that you will migrate your chosen volume to aggr1_cluster_02.
8. On the menu bar, select the Move button. You may need to expand the window to see the button.
Figure 3-16:
10
Figure 3-17:
A “Move Volume” confirmation window opens asking if you are sure that you want to move the volume.
11. Click the Move button.
11
Figure 3-18:
12
Figure 3-19:
The “Move Volume” window closes and focus returns to System Manager, which now displays the “Job”
pane, where System Manager lists the current jobs that are running on the cluster.
13. In the list of current jobs, select the entry that matches the Job ID for your move operation, and observe
the “State” column for your job. You may need to scroll down the window in order to see the entry for
your job.
14. If the job’s state is “queued”, or “running” then use the Refresh button to refresh the display every few
seconds until the Status changes to “success”.
14
13
Figure 3-20:
15
16
17. In the newly displayed pane for your chosen SVM, select the Volumes tab.
18. In the Volume pane, examine the entry for the volume you moved, and observe which aggregate it now
resides on.
17
18
Figure 3-22:
If you do not see any error messages, or just see a group of summary statistics, then sio was
unaffected by the volume move.
If sio is still running, enter Ctrl-c to terminate the utility. When it terminates, sio may output a number of
lines of summary statistics that you can ignore for this exercise.
Important: In preparation for the next exercise, repeat the volume move procedure to move
your volume back to aggr1_cluster1_01.
Note: The table does not contain a CIFS entry because HA failover is disruptive to the CIFS configuration
used in this lab.
In order to maintain optimum simulator performance, please only select one entry at a time for the storage failover
test. Once you complete the test for one entry you are welcome to come back and test another.
Note: You have not previously used the information in the File column of the table. That is information
you will use later in this lab section.
1. You will generate an I/O load to the volume for your selected entry using the sio command, and you will
run sio from the client listed for your chosen entry.
Figure 3-23:
• If the listed client is Jumphost, then you will need to open PowerShell (assuming you do not
already have one open). The PowerShell launch icon is on the taskbar of Jumphost.
Figure 3-24:
2. The sio command syntax you use is: sio 0 0 4k 0 50m 1200 2 <File> -create where <File> is
the “File” value that comes from the Storage Failover Exercise Choices table. This command initiates
a sequential write operation to <File> for a period of 20 minutes, which should be sufficient time to
complete the storage failover exercise.
The remaining screenshots and CLI examples in this section show the steps required to complete
the storage failover exercise for the Windows and iSCSI entry from the Storage Failover Exercise
Choices table. If you chose a different entry then you will need to substitute your entry’s values where
appropriate.
Launch the sio job for your client using the appropriate <File> value from the Storage Failover Exercise
Choices table. The following example is for the Windows and iSCSI entry.
Now that sio is running on your selected client you can look at the cluster’s current failover state.
3. If you do not currently have a System Manager instance running then, on the desktop of Jumphost,
launch Chrome and log into System Manager. (If you need further assistance in opening System
Manager then see Appendix 1.)
Figure 3-25:
4. In the command bar at the top of System Manager, select the Configuration tab. You may need to
expand your browser window in order to see this tab.
5. In the left pane, under the Cluster Settings section, select High Availability.
Figure 3-26:
The “High Availability” pane shows the current failover state of the HA pair. The Cluster HA Status line at
the top of the window should show a green check mark, and the text “All nodes are paired and ready for
failover”, indicating that your cluster is correctly prepared for HA failover.
All the volumes you created in this lab were originally hosted on an aggregate on the node cluster1-01. If
you performed the volume move exercise in this lab, then during that exercise you would have moved a
volume on that node over to an aggregate on the node cluster1-02. At the end of the exercise you should
also have moved the volume back to it’s original location. With that in mind, in this exercise you will have
cluster1-02 take over for cluster1-01 so you can see what effect a takeover operation has on the SVMs,
LIFs, volumes, and aggregates that are currently hosted on cluster1-01.
Cluster1-01 also hosts the cluster management LIF you are using to manage the cluster. Having
cluster1-02 takeover cluster1-01 will also allow you to see how a cluster management LIF behaves
during a takeover event.
Figure 3-27:
The “Takeover Confirmation” dialog opens, and includes warnings that the node cluster1-01 contains a
cluster management LIF that may be unreachable during the failover, and warns that CIFS sessions to
cluster1-01 may be disrupted as well.
7. Click the Takeover button to proceed.
7
Figure 3-28:
The “Takeover Confirmation” dialog closes, and focus returns to the “High Availability” pane in System
Manager.
8. The Cluster HA Status line at the top of the pane now shows a yellow caution icon, and states that a
storage failover is in progress.
10
8
Figure 3-29:
At some point, as you are monitoring the takeover activity, you will see an HTTP error window pop
open. This is expected behavior, and is the result of the loss of connectivity to the cluster management
LIF during takeover.
11. Click the Show Details button, and read through the details to gain a better understanding of why this
error has happened.
11
Figure 3-30:
Figure 3-31:
Your web browser page may display a connection timeout, or a page not found error, at this point.
If this happens, periodically use the Refresh button inside the High Availability pane. It may take
several minutes before the window is able to correctly display again. If several minutes elapse and
System Manager has still not re-establish connectivity, then try using your browser’s refresh button.
If your browser cannot re-connect within a minute or two using the browser refresh button, close
your browser, re-open it, log in again to System Manager, and then navigate back to the Cluster >
cluster1 > High Availability page.
Once System Manager is connected again, and depending on how frequently you use the Refresh
button inside the High Availability page during the takeover operation, you may see the node shown
as “Offline”. During a takeover operation, the node being taken over is actually undergoing a reboot. If
this were a production node on which you wanted to perform hardware maintenance, you could at this
point switch over to the controller's serial console in order to interrupt the node's boot process, and then
power off the controller. If you did this, the High Availability page in System Manager would continue
to show the node as offline for the duration of the downtime. Once you power the node back on again,
ONTAP will automatically start booting up the node, but will pause before bringing the node fully online,
not proceeding further until the node that has taken it over initiates a giveback.
In this lab you will just let the node shut down and start it's reboot without interruption, so after a few
minutes, and without any action required on your part, the High Availability pane will change to indicate
that the node cluster1-01 is now ready for giveback.
13. The “Cluster HA Status” line indicates that some of the nodes are in giveback state.
14. The summary message above the diagram indicates that cluster1-01 is online and ready for giveback,
and that some LIFs are not on their home node.
15. The status line under the image for cluster1-01 states that the node is waiting for giveback.
14
15
Figure 3-33:
16. Click on the Action button under cluster1-02, and select Give back node “cluster1-01”.
Figure 3-34:
A “Giveback Confirmation” window opens asking you to confirm that you want to initiate a giveback
operation from cluster1-02 to cluster1-01.
17. Click Giveback.
17
Figure 3-35:
The “Giveback Confirmation” window closes, and focus returns to the “High Availability” pane in System
Manager.
18. The summary message above the diagram changes to say that cluster1-01 is in a partial giveback
state, meaning that the giveback operation is in process.
19. The status line under the image for cluster1-01 states that the node is in partial giveback.
20. This page will update every minute or so while the giveback operation is in process, but you can use
the Refresh button to update the status information more frequently. It takes several minutes for the
giveback operation to complete.
18
19
Figure 3-36:
When the giveback operation completes, most, but not all, of the warning indicators displayed on the
High Availability pane will go back to green.
21. The icon on the Cluster HA Status line at the top of the pane reverts back to a green check mark, and
the line says that all nodes are paired and ready for takeover.
22. The summary messages above the cluster diagram indicate that both the nodes can perform takeover
on each other, but the warning that some LIFs are not on their home node remains unchanged.
Figure 3-37:
During a giveback, ONTAP will by default leave non-SAN LIFs on the node that performed the
takeover, since you will probably want to verify that everything is working correctly on the node that was
given back before you start sending all of the LIFs back to their home ports.
Examine the current port assignments for your cluster’s LIFs.
23. In the command bar at the top of System Manager, select the Network tab.
24. In the Network pane select the Network Interfaces tab.
25. Click on the Interface Name heading to sort the entries in interface name order, then scroll to the
bottom of the list to find the “cluster_mgmt” and the “svm1_cifs_nfs_lif*” LIFs.
26. Notice that all of the NAS and management LIFS for svm1 have a red icon under the “Current Port”
column, which indicates that these LIFs are not currently running on their home node and port.
24
25
26
Figure 3-38:
28
Figure 3-39:
A Warning dialog will open that indicates you have chosen to migrate a cluster management LIF which
could result is some disruption to management access, and asks you to confirm that you want to
proceed.
29. Click Yes.
29
Figure 3-40:
The Warning dialog closes, and shortly after an Error dialog may open stating the server is not
responding to HTTP. If this does happen (and it often does not here), this is again the result of the
cluster management LIF being temporarily unavailable as it migrates back to it’s home port.
30. Click OK.
Figure 3-41:
If you encounter this Error dialog, then periodically use the Refresh button inside the Network
Interfaces tab until System Manager is properly responding again. If you have not re-established
connectivity within 1-2 minutes then you will probably need to use your browser’s refresh button to
refresh the page until it is able to communicate again with System Manager. If your browser cannot re-
connect within a minute or two using the browser refresh button, close your browser, re-open it, log in
again to System Manager, and then navigate back to the Network > Network Interfaces page.
31. The red icon next to the Network Interfaces list entry for the cluster_mgmt LIF is now gone, and the
“Current Port” column indicates that the LIF is back on port cluster1-01:e0c.
31
Figure 3-42:
Send the other LIFS with red icons home following this same procedure.
If sio encountered no problems while writing to the file during the takeover and giveback, then it will not
generate any additional output at all, unless if has finished it’s targeted 20 minute execution duration. In
that case it generates a number of lines of statistical output. That statistical output is not relevant for this
exercise.
This concludes the exercise. If sio is still running, issue a Ctrl-c in the PowerShell or PuTTY window to terminate
it’s execution.
Figure 4-1:
Once PuTTY launches, you can connect to one of the hosts in the lab by following these steps. This
example shows a user connecting to the ONTAP cluster named “cluster1”.
2. By default PuTTY should launch into the “Basic options for your PuTTY session” display, as shown in the
screenshot. If you accidentally navigate away from this view, just click on the Session category item to
return to this view.
3. Use the scrollbar in the “Saved Sessions” box to navigate down to the desired host and double-click it
to open the connection. A terminal window will open, and you will be prompted to log into the host. You
can find the correct username and password for the host in the Lab Host Credentials table in the Lab
Environment section at the beginning of this guide.
Figure 4-2:
If you are new to the clustered ONTAP CLI, the length of the commands can seem a little initimidating.
However, the commands are actually quite easy to use if you remember these three tips:
• Make liberal use of the Tab key while entering commands, as the ONTAP command shell
supports tab completion. If you hit the Tab key while entering a portion of a command word,
the command shell will examine the context and try to complete the rest of the word for you.
If there is insufficient context to make a single match, it will display a list of all the potential
matches. Tab completion also usually works with command argument values, but there are
some cases where there is simply not enough context for it to know what you want, in which
case you will just need to type in the argument value.
• You can recall your previously entered commands by repeatedly pressing the up-arrow key,
and you can then navigate up and down the list using the up and down arrow keys. When you
find a command you want to modify, you can use the left arrow, right arrow, and Delete keys
to navigate around in a selected command to edit it.
• Entering a question mark character “?? causes the CLI to print contextual help information.
You can use this character by itself, or while entering a command.
If you would like to learn more about the features of the ONTAP CLI, the Advanced Concepts for NetApp
ONTAP 9 lab includes an extensive tutorial on this subject.
Figure 4-3:
Figure 4-4:
System Manager is now logged in to cluster1 and displays a summary page for the cluster. If you are
unfamiliar with System Manager, here is a quick introduction to its layout. Please take a few moments to
expand and browse these tabs to familiarize yourself with their contents.
4. The primary means of navigating within System Manager is through the tabs that appear under the blue
“NetApp OnCommand System Manager” bar at the top of the window. This bar of tabs is sometimes
5
6
Figure 4-5:
Tip: As you use System Manager in this lab, you may encounter situations where buttons at
the bottom of a System Manager pane are beyond the viewing size of the window, and no scroll
bar exists to allow you to scroll down to see them. If this happens, then you have two options;
either increase the size of the browser window (you might need to increase the resolution of
your Jumphost desktop to accommodate the larger browser window), or in the System Manager
window, use the tab key to cycle through all the various fields and buttons, which eventually
forces the window to scroll down to the non-visible items.
NetApp provides no representations or warranties regarding the accuracy, reliability, or serviceability of any
information or recommendations provided in this publication, or with respect to any results that may be obtained
by the use of the information or observance of any recommendations provided herein. The information in this
document is distributed AS IS, and the use of this information or the implementation of any recommendations or
techniques herein is a customer’s responsibility and depends on the customer’s ability to evaluate and integrate
them into the customer’s operational environment. This document and the information contained herein may be
used solely in connection with the NetApp products discussed in this document.
®
Go further, faster
© 2016NetApp, Inc. All rights reserved. No portions of this document may be reproduced without prior written consent
of NetApp, Inc. Specifications are subject to change without notice. NetApp, the NetApp logo, Data ONTAP®,
ONTAP®, OnCommand®, SANtricity®, FlexPod®, SnapCenter®, and SolidFire® are trademarks or registered
trademarks of NetApp, Inc. in the United States and/or other countries. All other brands or products are trademarks or
registered trademarks of their respective holders and should be treated as such.