Solaris Volume Manager Admin Guide
Solaris Volume Manager Admin Guide
Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A.
Part No: 817253010 April 2004
This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun logo, docs.sun.com, AnswerBook, AnswerBook2, Solstice DiskSuite, OpenBoot, Solstice Enterprise Agents, and Solaris are trademarks, registered trademarks, or service marks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. The OPEN LOOK and Sun Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Suns licensees who implement OPEN LOOK GUIs and otherwise comply with Suns written license agreements. U.S. Government Rights Commercial software. Government users are subject to the Sun Microsystems, Inc. standard license agreement and applicable provisions of the FAR and its supplements. DOCUMENTATION IS PROVIDED AS IS AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID. Copyright 2004 Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, CA 95054 U.S.A. Tous droits rservs.
Ce produit ou document est protg par un copyright et distribu avec des licences qui en restreignent lutilisation, la copie, la distribution, et la dcompilation. Aucune partie de ce produit ou document ne peut tre reproduite sous aucune forme, par quelque moyen que ce soit, sans lautorisation pralable et crite de Sun et de ses bailleurs de licence, sil y en a. Le logiciel dtenu par des tiers, et qui comprend la technologie relative aux polices de caractres, est protg par un copyright et licenci par des fournisseurs de Sun. Des parties de ce produit pourront tre drives du systme Berkeley BSD licencis par lUniversit de Californie. UNIX est une marque dpose aux Etats-Unis et dans dautres pays et licencie exclusivement par X/Open Company, Ltd. Sun, Sun Microsystems, le logo Sun, docs.sun.com, AnswerBook, AnswerBook2, Solstice DiskSuite, OpenBoot, Solstice Enterprise Agents, et Solaris sont des marques de fabrique ou des marques dposes, ou marques de service, de Sun Microsystems, Inc. aux Etats-Unis et dans dautres pays. Toutes les marques SPARC sont utilises sous licence et sont des marques de fabrique ou des marques dposes de SPARC International, Inc. aux Etats-Unis et dans dautres pays. Les produits portant les marques SPARC sont bass sur une architecture dveloppe par Sun Microsystems, Inc. Linterface dutilisation graphique OPEN LOOK et Sun a t dveloppe par Sun Microsystems, Inc. pour ses utilisateurs et licencis. Sun reconnat les efforts de pionniers de Xerox pour la recherche et le dveloppement du concept des interfaces dutilisation visuelle ou graphique pour lindustrie de linformatique. Sun dtient une licence non exclusive de Xerox sur linterface dutilisation graphique Xerox, cette licence couvrant galement les licencis de Sun qui mettent en place linterface dutilisation graphique OPEN LOOK et qui en outre se conforment aux licences crites de Sun. CETTE PUBLICATION EST FOURNIE EN LETAT ET AUCUNE GARANTIE, EXPRESSE OU IMPLICITE, NEST ACCORDEE, Y COMPRIS DES GARANTIES CONCERNANT LA VALEUR MARCHANDE, LAPTITUDE DE LA PUBLICATION A REPONDRE A UNE UTILISATION PARTICULIERE, OU LE FAIT QUELLE NE SOIT PAS CONTREFAISANTE DE PRODUIT DE TIERS. CE DENI DE GARANTIE NE SAPPLIQUERAIT PAS, DANS LA MESURE OU IL SERAIT TENU JURIDIQUEMENT NUL ET NON AVENU.
031208@7518
Contents
Preface 17
21 21 22 22 24 24 25 23
Finding Solaris Volume Manager Information and Tasks Solaris Volume Manager RoadmapWhats New Solaris Volume Manager RoadmapAvailability Solaris Volume Manager RoadmapStorage Capacity Solaris Volume Manager RoadmapI/O Performance Solaris Volume Manager RoadmapAdministration Solaris Volume Manager RoadmapTroubleshooting
27 27
Conguration Planning Guidelines Choosing Storage Mechanisms Performance Issues 31 General Performance Guidelines Random I/O 32 33
29 29 31 32
35 35
3
How Does Solaris Volume Manager Manage Storage? 35 How to Administer Solaris Volume Manager 36 M How to Access the Solaris Volume Manager Graphical User Interface Solaris Volume Manager Requirements 38 Overview of Solaris Volume Manager Components 38 Volumes 39 State Database and State Database Replicas 43 Hot Spare Pools 44 Disk Sets 44 Solaris Volume Manager Conguration Guidelines 45 General Guidelines 45 File System Guidelines 45 Overview of Creating Solaris Volume Manager Components 46 Prerequisites for Creating Solaris Volume Manager Components 46 Overview of Large Volume Support in Solaris Volume Manager 47 Large Volume Support Limitations 47 Using Large Volumes 48 Upgrading to Solaris Volume Manager 48
37
Conguring and Using Solaris Volume Manager (Scenario) Scenario Background Information 49 Hardware Conguration 49 50 51 Physical Storage Conguration
49
53 53 56 55 56
About the Solaris Volume Manager State Database and Replicas Understanding the Majority Consensus Algorithm Recommendations for State Database Replicas Guidelines for State Database Replicas Handling State Database Replica Errors ScenarioState Database Replicas 58 57 57 Background Information for Dening State Database Replicas
61 61 62
M How to Create State Database Replicas 62 Maintaining State Database Replicas 64 M How to Check the Status of State Database Replicas M How to Delete State Database Replicas 65
64
RAID 0 (Stripe and Concatenation) Volumes (Overview) 67 Overview of RAID 0 Volumes 67 RAID 0 (Stripe) Volume 68 RAID 0 (Concatenation) Volume 70 RAID 0 (Concatenated Stripe) Volume 72 Background Information for Creating RAID 0 Volumes 74 RAID 0 Volume Requirements 74 RAID 0 Volume Guidelines 74 ScenarioRAID 0 Volumes 75
RAID 0 (Stripe and Concatenation) Volumes (Tasks) 77 RAID 0 Volumes (Task Map) 77 Creating RAID 0 (Stripe) Volumes 78 M How to Create a RAID 0 (Stripe) Volume 78 Creating RAID 0 (Concatenation) Volumes 79 M How to Create a RAID 0 (Concatenation) Volume 79 Expanding Storage Space 81 M How to Expand Storage Space for Existing Data M How to Expand an Existing RAID 0 Volume Removing a RAID 0 Volume 84 84 M How to Remove a RAID 0 Volume 82 81
RAID 1 (Mirror) Volumes (Overview) Overview of RAID 1 (Mirror) Volumes Overview of Submirrors 88 ScenarioRAID 1 (Mirror) Volume Providing RAID 1+0 and RAID 0+1 RAID 1 Volume Options Full Resynchronization 91
87 87 88 89 90
Conguration Guidelines for RAID 1 Volumes RAID 1 Volume (Mirror) Resynchronization 93 93 Optimized Resynchronization 93
Contents
94 94 95 95 96
Background Information for Creating RAID 1 Volumes How Booting Into Single-User Mode Affects RAID 1 Volumes ScenarioRAID 1 Volumes (Mirrors) 96
10
RAID 1 (Mirror) Volumes (Tasks) RAID 1 Volumes (Task Map) Creating a RAID 1 Volume 97 99
97
M How to Create a RAID 1 Volume From Unused Slices M How to Create a RAID 1 Volume From a File System Special Considerations for Mirroring root (/) Understanding Boot Time Warnings Booting From Alternate Boot Devices Working With Submirrors 108 108 109 110 111 M How to Attach a Submirror M How to Detach a Submirror 106 106
99 101
106
M How to Place a Submirror Offline and Online M How to Enable a Slice in a Submirror Maintaining RAID 1 Volumes 112
M How to Check the Status of Mirrors and Submirrors M How to Change RAID 1 Volume Options M How to Expand a RAID 1 Volume 116 117 117 119 115
113
Responding to RAID 1 Volume Component Failures M How to Replace a Slice in a Submirror M How to Replace a Submirror M How to Unmirror a File System Using a RAID 1 Volume to Back Up Data 118 119 Removing RAID 1 Volumes (Unmirroring)
M How to Unmirror a File System That Cannot Be Unmounted 123 M How to Use a RAID 1 Volume to Make an Online Backup
121 123
11
127 127
128
12
Soft Partitions (Tasks) Soft Partitions (Task Map) Creating Soft Partitions
M How to Check the Status of a Soft Partition M How to Expand a Soft Partition M How to Remove a Soft Partition
13
ExampleConcatenated (Expanded) RAID 5 Volume Background Information for Creating RAID 5 Volumes Requirements for RAID 5 Volumes Guidelines for RAID 5 Volumes ScenarioRAID 5 Volumes 142 140 141
142
14
RAID 5 Volumes (Tasks) RAID 5 Volumes (Task Map) Creating RAID 5 Volumes
M How to Check the Status of a RAID 5 Volume M How to Expand a RAID 5 Volume
M How to Enable a Component in a RAID 5 Volume M How to Replace a Component in a RAID 5 Volume
15
153 153
Overview of Hot Spares and Hot Spare Pools How Hot Spares Work 154
Contents
Hot Spare Pools 155 ExampleHot Spare Pool Administering Hot Spare Pools ScenarioHot Spares 157
155 156
16
Hot Spare Pools (Tasks) 159 Hot Spare Pools (Task Map) 159 Creating a Hot Spare Pool 160 M How to Create a Hot Spare Pool 160 M How to Add Additional Slices to a Hot Spare Pool 161 Associating a Hot Spare Pool With Volumes 162 M How to Associate a Hot Spare Pool With a Volume 162 M How to Change the Associated Hot Spare Pool 164 Maintaining Hot Spare Pools 165 M How to Check the Status of Hot Spares and Hot Spare Pools M How to Replace a Hot Spare in a Hot Spare Pool 166 M How to Delete a Hot Spare From a Hot Spare Pool 168 M How to Enable a Hot Spare 169
165
17
Transactional Volumes (Overview) 171 About File System Logging 171 Choosing a Logging Method 172 Transactional Volumes 173 173 174 175 176 176 ExampleTransactional Volume ExampleShared Log Device
Requirements for Working with Transactional Volumes Guidelines for Working with Transactional Volumes Checking the Status of Transactional Volumes ScenarioTransactional Volumes 178 177
18
Transactional Volumes (Tasks) Transactional Volumes (Task Map) Creating Transactional Volumes
M How to Attach a Log Device to a Transactional Volume M How to Detach a Log Device from a Transactional Volume M How to Expand a Transactional Volume M How to Remove a Transactional Volume Sharing Log Devices 198 198 199 192 194
M How to Remove a Transactional Volume and Retain the Mount Device M How to Share a Log Device Among File Systems Recovering Transactional Volumes When Errors Occur
195
199 200
19
How Does Solaris Volume Manager Manage Disk Sets? Automatic Disk Partitioning Disk Set Name Requirements ExampleTwo Shared Disk Sets Background Information for Disk Sets Requirements for Disk Sets Guidelines for Disk Sets Administering Disk Sets Reserving a Disk Set Releasing a Disk Set ScenarioDisk Sets 211 209 210 210 208 209
20
Disk Sets (Tasks) Disk Sets (Task Map) Creating Disk Sets Expanding Disk Sets
M How to Add Drives to a Disk Set M How to Add a Host to a Disk Set Maintaining Disk Sets 219
M How to Create Solaris Volume Manager Components in a Disk Set M How to Check the Status of a Disk Set 219
Contents
M How to Remove Disks from a Disk Set M How to Take a Disk Set M How to Release a Disk Set 221 222 224
220
21
Maintaining Solaris Volume Manager (Tasks) Solaris Volume Manager Maintenance (Task Map)
M How to View the Solaris Volume Manager Volume Conguration ExampleViewing a Large Terabyte Solaris Volume Manager Volume Where To Go From Here Renaming Volumes 232 232 233 234 235 235 232
Background Information for Renaming Volumes Exchanging Volume Names M How to Rename a Volume Working with Conguration Files
M How to Initialize Solaris Volume Manager From a Conguration File 237 237 238 239 240 M How to Increase the Number of Default Volumes How to Increase the Number of Default Disk Sets Expanding a File System With the growfs Command M How to Expand a File System 240
235
Overview of Replacing and Enabling Components in RAID 1 and RAID 5 Volumes 241 Enabling a Component 242 242 243 Replacing a Component With Another Available Component Maintenance and Last Erred States
Background Information For Replacing and Enabling Slices in RAID 1 and RAID 5 Volumes 244
22
Best Practices for Solaris Volume Manager Deploying Small Servers 245
245 247
10
23
Automatic (Top Down) Volume Creation (Tasks) Top Down Volume Creation (Task Map) Overview Of Top Down Volume Creation Top Down Creation Capabilities Top Down Creation Implemention Top Down Creation Process Before You Begin 252 253 253 253 251 250 251 249 250
249
Understanding Which Disks Are Available Creating Volumes Automatically Creating a Volume Automatically
Analyzing Volume Creation with the metassist Command Creating a Command File with the metassist Command
Creating a Volume with A Saved Shell Script Created by the metassist Command 258 Creating a Volume Conguration File with the metassist Command Changing Default Behavior of the metassist Command Changing the Volume Defaults File 260 260 259
24
Conguring the mdmonitord Command for Periodic Error Checking M How to Congure the mdmonitord Command for Periodic Error Checking 264 Solaris Volume Manager SNMP Agent Overview 265 266 Conguring the Solaris Volume Manager SNMP Agent Limitations of the Solaris Volume Manager SNMP Agent Monitoring Solaris Volume Manager With a cron Job M How to Automate Checking for Errors in Volumes
M How to Congure the Solaris Volume Manager SNMP Agent 268 269 269
266
25
Troubleshooting Solaris Volume Manager (Tasks) Troubleshooting Solaris Volume Manager (Task Map) Overview of Troubleshooting the System 278 Prerequisites for Troubleshooting the System General Troubleshooting Approach 279
277 277
278 278
Contents
11
Replacing Disks
Recovering from Disk Movement Problems Disk Movement and Device ID Overview
Resolving Unnamed Devices Error Message Recovering From Boot Problems 284 Background Information for Boot Problems
How to Recover From Improper /etc/vfstab Entries M How to Recover From a Boot Device Failure Recovering From State Database Replica Failures Repairing Transactional Volumes Panics 294 294 294 Transactional Volume Errors 294 287 291
291
How to Recover Conguration Data for a Soft Partition Recovering Conguration From a Different System How to Recover a Conguration 297 297
295
Important Solaris Volume Manager Files System Files and Startup Files Manually Congured Files Overview of the md.tab File 303 305 305
303
307
Solaris Volume Manager CIM/WBEM API Managing Solaris Volume Manager 309
309
Index
311
12
Tables
TABLE 11 TABLE 12 TABLE 13 TABLE 14 TABLE 15 TABLE 16 TABLE 21 TABLE 22 TABLE 31 TABLE 32 TABLE 33 TABLE 91 TABLE 92 TABLE 101 TABLE 102 TABLE 141 TABLE 142 TABLE 161 TABLE 171 TABLE 191 TABLE 251 TABLE B1
22 22 24 24 25 23
Solaris Volume Manager RoadmapStorage Capacity Solaris Volume Manager RoadmapI/O Performance Solaris Volume Manager RoadmapAdministration Solaris Volume Manager RoadmapTroubleshooting Choosing Storage Mechanisms Optimizing Redundant Storage Classes of Volumes 40 43 92 92 29 30 39
RAID 1 Volume Read Policies RAID 1 Volume Write Policies Submirror States RAID 5 States 112 112 147 146 Submirror Slice States RAID 5 Slice States
Hot Spare Pool States (Command Line) Transactional Volume States Example Volume Names 177 207
166
Common Solaris Volume Manager Boot Problems Solaris Volume Manager Commands 307
284
13
14
Figures
FIGURE 31
View of the Enhanced Storage tool (Solaris Volume Manager) in the Solaris Management Console 37 Relationship Among a Volume, Physical Disks, and Slices Basic Hardware Diagram RAID 0 (Stripe) Example 49 69 71 72 41
FIGURE 32 FIGURE 41 FIGURE 71 FIGURE 72 FIGURE 73 FIGURE 91 FIGURE 92 FIGURE 131 FIGURE 132 FIGURE 151 FIGURE 171 FIGURE 172 FIGURE 191 FIGURE 221 FIGURE 231
RAID 0 (Concatenation) Example RAID 1 (Mirror) Example RAID 1+ 0 Example 89 138 RAID 5 Volume Example Hot Spare Pool Example 88
Expanded RAID 5 Volume Example 155 173 Transactional Volume Example Disk Sets Example 207 245
139
174
The metassist command supports end-to-end processing, based on command line or les, or partial processing to allow the system administrator to provide le-based data or check volume characteristics. 251
15
16
Preface
The Solaris Volume Manager Administration Guide explains how to use Solaris Volume Manager to manage your systems storage needs, including creating, modifying, and using RAID 0 (concatenation and stripe) volumes, RAID 1 (mirror) volumes, and RAID 5 volumes, in addition to soft partitions and transactional log devices.
Chapter 5 describes concepts related to state databases and state database replicas. Chapter 6 explains how to perform tasks related to state databases and state database replicas. Chapter 7 describes concepts related to RAID 0 (stripe and concatenation) volumes. Chapter 8 explains how to perform tasks related to RAID 0 (stripe and concatenation) volumes. Chapter 9 describes concepts related to RAID 1 (mirror) volumes. Chapter 10 explains how to perform tasks related to RAID 1 (mirror) volumes. Chapter 11 describes concepts related to the Solaris Volume Manager soft partitioning feature. Chapter 12 explains how to perform soft partitioning tasks. Chapter 13 describes concepts related to RAID 5 volumes. Chapter 14 explains how to perform tasks related to RAID 5 volumes. Chapter 15 describes concepts related to hot spares and hot spare pools. Chapter 16 explains how to perform tasks related to hot spares and hot spare pools. Chapter 17 describes concepts related to transactional volumes. Chapter 18 explains how to perform tasks related to transactional volumes. Chapter 19 describes concepts related to disk sets. Chapter 20 explains how to perform tasks related to disk sets. Chapter 21 explains some general maintenance tasks that are not related to a specic Solaris Volume Manager component. Chapter 22 provides some best practices information about conguring and using Solaris Volume Manager. Chapter 23 describes concepts of and tasks related to the Solaris Volume Manager top-down volume creation feature. Chapter 24 provides concepts and instructions for using the Solaris Volume Manager SNMP agent and for other error checking approaches. Chapter 25 provides information about troubleshooting and solving common problems in the Solaris Volume Manager environment. Appendix A lists important Solaris Volume Manager les.
18
Appendix B provides tables that summarize commands and other helpful information. Appendix C provides a brief introduction to the CIM/WBEM API that allows open Solaris Volume Manager management from WBEM-compliant management tools.
Related Books
Solaris Volume Manager is one of several system administration tools available for the Solaris operating environment. Information about overall system administration features and functions, as well as related tools are provided in the following:
I I
System Administration Guide: Basic Administration System Administration Guide: Advanced Administration
Typographic Conventions
The following table describes the typographic changes used in this book.
Preface 19
AaBbCc123
Edit your .login le. Use ls -a to list all les. machine_name% you have mail.
AaBbCc123
What you type, contrasted with on-screen computer output Command-line placeholder: replace with a real name or value Book titles, new words, or terms, or words to be emphasized.
machine_name% su Password: To delete a le, type rm lename. Read Chapter 6 in Users Guide. These are called class options. You must be root to do this.
AaBbCc123 AaBbCc123
C shell prompt C shell superuser prompt Bourne shell and Korn shell prompt
machine_name% machine_name# $
20
CHAPTER
Caution If you do not use Solaris Volume Manager correctly, you can destroy data. Solaris Volume Manager provides a powerful way to reliably manage your disks and data on them. However, you should always maintain backups of your data, particularly before you modify an active Solaris Volume Manager conguration.
Use physical LUNs that are greater than 1 TB in size, or create logical volumes that are greater than 1 TB.
Set up storage
Create storage that spans slices by creating a RAID 0 or a RAID 5 volume. The RAID 0 or RAID 5 volume can then be used for a le system or any application, such as a database that accesses the raw device
How to Create a RAID 0 (Stripe) Volume on page 78 How to Create a RAID 0 (Concatenation) Volume on page 79 How to Create a RAID 1 Volume From Unused Slices on page 99 How to Create a RAID 1 Volume From a File System on page 101 How to Create a RAID 5 Volume on page 144
Expand an existing le system Expand an existing RAID 0 (concatenation or stripe) volume Expand a RAID 5 volume Increase the size of a UFS le system on a expanded volume
Increase the capacity of an existing le system by creating a RAID 0 (concatenation) volume, then adding additional slices. Expand an existing RAID 0 volume by concatenating additional slices to it. Expand the capacity of a RAID 5 volume by concatenating additional slices to it. Grow a le system by using the growfs command to expand the size of a UFS while it is mounted and without disrupting access to the data.
How to Expand Storage Space for Existing Data on page 81 How to Expand an Existing RAID 0 Volume on page 82 How to Expand a RAID 5 Volume on page 148 How to Expand a File System on page 240
22
TABLE 12 Task
(Continued)
For Instructions
Subdivide slices or logical Subdivide logical volumes or slices by using volumes into smaller soft partitions. partitions, breaking the 8 slice hard partition limit Create a le system Create a le system on a RAID 0 (stripe or concatenation), RAID 1 (mirror), RAID 5,or transactional volume, or on a soft partition.
Use Solaris Volume Managers mirroring feature to maintain multiple copies of your data. You can create a RAID 1 volume from unused slices in preparation for data, or you can mirror an existing le system, including root (/) and /usr. Increase data availability with minimum of hardware by using Solaris Volume Managers RAID 5 volumes.
How to Create a RAID 1 Volume From Unused Slices on page 99 How to Create a RAID 1 Volume From a File System on page 101 How to Create a RAID 5 Volume on page 144 Creating a Hot Spare Pool on page 160 Associating a Hot Spare Pool With Volumes on page 162
Increase data availability for Increase data availability for a RAID 1 or a an existing RAID 1 or RAID 5 RAID 5 volume, by creating a hot spare pool volume then associate it with a mirrors submirrors, or a RAID 5 volume. Increase le system availability after reboot
About File System Logging Increase overall le system availability after reboot, by adding UFS logging (transactional on page 171 volume) to the system. Logging a le system reduces the amount of time that the fsck command has to run when the system reboots.
23
Tune RAID 1 volume Specify the read and write policies for a RAID 1 volume read and write to improve performance for a given conguration. policies
RAID 1 Volume Read and Write Policies on page 92 How to Change RAID 1 Volume Options on page 115
Creating RAID 0 (stripe) volumes optimizes performance Creating RAID 0 (Stripe) of devices that make up the stripe. The interlace value can Volumes on page 78 be optimized for random or sequential access. Expanding Storage Space on page 81
Maintain device Expands stripe or concatenation that has run out of space performance within a by concatenating a new component to it. A concatenation RAID 0 (stripe) of stripes is better for performance than a concatenation of slices.
Graphically Use the Solaris Management Console to administer your administer your volume management conguration. volume management conguration Graphically Use the Solaris Management Console graphical user administer slices and interface to administer your disks and le systems, le systems performing such tasks as partitioning disks and constructing UFS le systems. Optimize Solaris Volume Manager Solaris Volume Manager performance is dependent on a well-designed conguration. Once created, the conguration needs monitoring and tuning.
Online help from within Solaris Volume Manager (Enhanced Storage) node of the Solaris Management Console application Online help from within the Solaris Management Console application Solaris Volume Manager Conguration Guidelines on page 45 Working with Conguration Files on page 235
24
TABLE 15 Task
(Continued)
For Instructions
Because le systems tend to run out of space, you can plan for future growth by putting a le system into a concatenation.
If a disk fails, you must replace the slices used in your Solaris Volume Manager conguration. In the case of RAID 0 volume, you have to use a new slice, delete and recreate the volume, then restore data from a backup. Slices in RAID 1 and RAID 5 volumes can be replaced and resynchronized without loss of data. Special problems can arise when booting the system, due to a hardware problem or operator error.
Responding to RAID 1 Volume Component Failures on page 117 How to Replace a Component in a RAID 5 Volume on page 150 How to Recover From Improper /etc/vfstab Entries on page 285 How to Recover From Insufficient State Database Replicas on page 291 How to Recover From a Boot Device Failure on page 287
Problems with transactional volumes can occur on either Work with transactional volume the master or logging device, and they can either be problems caused by data or device problems. All transactional volumes sharing the same logging device must be xed before they return to a usable state.
How to Recover a Transactional Volume With a Panic on page 199 How to Recover a Transactional Volume With Hard Errors on page 200
25
26
CHAPTER
Introduction to Storage Management on page 27 Conguration Planning Guidelines on page 29 Performance Issues on page 31 Random I/O and Sequential I/O Optimization on page 32
Storage Hardware
There are many different devices on which data can be stored. The selection of devices to best meet your storage needs depends primarily on three factors:
I I I
You can use Solaris Volume Manager to help manage the tradeoffs in performance, availability and cost. You can often mitigate many of the tradeoffs completely with Solaris Volume Manager.
27
Solaris Volume Manager works well with any supported storage on any system that runs the Solaris Operating Environment.
RAID Levels
RAID is an acronym for Redundant Array of Inexpensive (or Independent) Disks. RAID refers to a set of disks, called an array or a volume, that appears to the user as a single large disk drive. This array provides, depending on the conguration, improved reliability, response time, or storage capacity. Technically, there are six RAID levels, 0-5. Each level refers to a method of distributing data while ensuring data redundancy. (RAID level 0 does not provide data redundancy, but is usually included as a RAID classication anyway. RAID level 0 provides the basis for the majority of RAID congurations in use.) Very few storage environments support RAID levels 2, 3, and 4, so those environments are not described here. Solaris Volume Manager supports the following RAID levels:
I
RAID Level 0Although stripes and concatenations do not provide redundancy, these constructions are often referred to as RAID 0. Basically, data are spread across relatively small, equally-sized fragments that are allocated alternately and evenly across multiple physical disks. Any single drive failure can cause data loss. RAID 0 offers a high data transfer rate and high I/O throughput, but suffers lower reliability and lower availability than a single disk RAID Level 1Mirroring uses equal amounts of disk capacity to store data and a copy (mirror) of the data. Data is duplicated, or mirrored, over two or more physical disks. Data can be read from both drives simultaneously, meaning that either drive can service any request, which provides improved performance. If one physical disk fails, you can continue to use the mirror with no loss in performance or loss of data. Solaris Volume Manager supports both RAID 0+1 and (transparently) RAID 1+0 mirroring, depending on the underlying devices. See Providing RAID 1+0 and RAID 0+1 on page 89 for details.
RAID Level 5RAID 5 uses striping to spread the data over the disks in an array. RAID 5 also records parity information to provide some data redundancy. A RAID level 5 volume can withstand the failure of an underlying device without failing. If a RAID level 5 volume is used in conjunction with hot spares, the volume can withstand multiple failures without failing. A RAID level 5 volume will have a substantial performance degradation when operating with a failed device. In the RAID 5 model, every device has one area that contains a parity stripe and others that contain data. The parity is spread over all of the disks in the array, which reduces the write time. Write time is reduced because writes do not have to wait until a dedicated parity disk can accept the data.
28
Solaris Volume Manager RAID 0 (concatenation and stripe) volumes RAID 1 (mirror) volumes RAID 5 volumes Soft partitions Transactional (logging) volumes File systems that are constructed on Solaris Volume Manager volumes
TABLE 21
Requirements
No No
No Yes
Yes Yes
No No
29
TABLE 21
(Continued)
RAID 1 (Mirror) RAID 5 Soft Partitions
Requirements
Improved write performance More than 8 slices per device Larger available storage space
No
Yes
No
No
No
No
No
No
No
Yes
Yes
Yes
No
Yes
No
TABLE 22
RAID 0 devices (stripes and concatenations), and soft partitions do not provide any redundancy of data. Concatenation works well for small random I/O. Striping performs well for large sequential I/O and for random I/O distributions. Mirroring might improve read performance, but write performance is always degraded in mirrors. Because of the read-modify-write nature of RAID 5 volumes, volumes with over about 20 percent writes should not be RAID 5. If redundancy is required, consider mirroring. RAID 5 writes cannot be as fast as mirrored writes, which in turn cannot be as fast as unprotected writes. Soft partitions are useful for managing very large storage devices.
I I I
Note In addition to these generic storage options, see Hot Spare Pools on page 44 for more information about using Solaris Volume Manager to support redundant devices.
30
Performance Issues
General Performance Guidelines
When you design your storage conguration, consider the following performance guidelines:
I
Striping generally has the best performance, but striping offers no data redundancy. For write-intensive applications, RAID 1 volumes generally have better performance than RAID 5 volumes. RAID 1 and RAID 5 volumes both increase data availability, but both volume types generally have lower performance for write operations. Mirroring does improve random read performance. RAID 5 volumes have a lower hardware cost than RAID 1 volumes, while RAID 0 volumes have no additional hardware cost. Identify the most frequently accessed data, and increase access bandwidth to that data with mirroring or striping. Both stripes and RAID 5 volumes distribute data across multiple disk drives and help balance the I/O load. Use available performance monitoring capabilities and generic tools such as the iostat command to identify the most frequently accessed data. Once identied, the access bandwidth to this data can be increased using striping, RAID 1 volumes or RAID 5 volumes. The performance of soft partitions can degrade when the soft partition size is changed multiple times. RAID 5 volume performance is lower than stripe performance for write operations. This performance penalty results from the multiple I/O operations required to calculate and store the RAID 5 volume parity. For raw random I/O reads, the stripe and the RAID 5 volume are comparable. Both the stripe and RAID 5 volumes split the data across multiple disks. RAID 5 volume parity calculations are not a factor in reads except after a slice failure. For raw random I/O writes, the stripe is superior to RAID 5 volumes.
31
Random I/O
In a random I/O environment, such as an environment used for databases and general-purpose le servers, all disks should spend equal amounts of time servicing I/O requests. For example, assume that you have 40 Gbytes of storage for a database application. If you stripe across four 10 Gbyte disk spindles, and if the I/O is random and evenly dispersed across the volume, then each of the disks will be equally busy, which generally improves performance. The target for maximum random I/O performance on a disk is 35 percent or lower usage, as reported by the iostat command. Disk use in excess of 65 percent on a typical basis is a problem. Disk use in excess of 90 percent is a signicant problem. The solution to having disk use values that are too high is to create a new RAID 0 volume with more disks (spindles).
Note Simply attaching additional disks to an existing volume cannot improve performance. You must create a new volume with the ideal parameters to optimize performance.
The interlace size of the stripe does not matter because you just want to spread the data across all the disks. Any interlace value greater than the typical I/O request will do.
32
In sequential applications, the typical I/O size is usually large, meaning more than 128 Kbytes or even more than 1 Mbyte. Assume an application with a typical I/O request size of 256 Kbytes and assume striping across 4 disk spindles. 256 Kbytes / 4 = 64 Kbytes. So, a good choice for the interlace size would be 32 to 64 Kbyte.
33
34
CHAPTER
What Does Solaris Volume Manager Do? on page 35 Solaris Volume Manager Requirements on page 38 Overview of Solaris Volume Manager Components on page 38 Solaris Volume Manager Conguration Guidelines on page 45 Overview of Creating Solaris Volume Manager Components on page 46
Increasing storage capacity Increasing data availability Easing administration of large storage devices
In some instances, Solaris Volume Manager can also improve I/O performance.
A volume is functionally identical to a physical disk in the view of an application or a le system. Solaris Volume Manager converts I/O requests directed at a volume into I/O requests to the underlying member disks. Solaris Volume Manager volumes are built from disk slices or from other Solaris Volume Manager volumes. An easy way to build volumes is to use the graphical user interface that is built into the Solaris Management Console. The Enhanced Storage tool within the Solaris Management Console presents you with a view of all the existing volumes. By following the steps in wizards, you can easily build any kind of Solaris Volume Manager volume or component. You can also build and modify volumes by using Solaris Volume Manager command-line utilities. For example, if you need more storage capacity as a single volume, you could use Solaris Volume Manager to make the system treat a collection of slices as one larger volume. After you create a volume from these slices, you can immediately begin using the volume just as you would use any real slice or device. For a more detailed discussion of volumes, see Volumes on page 39. Solaris Volume Manager can increase the reliability and availability of data by using RAID 1 (mirror) volumes and RAID 5 volumes. Solaris Volume Manager hot spares can provide another level of data availability for mirrors and RAID 5 volumes. Once you have set up your conguration, you can use the Enhanced Storage tool within the Solaris Management Console to report on its operation.
Solaris Management ConsoleThis tool provides a graphical user interface to volume management functions. Use the Enhanced Storage tool within the Solaris Management Console as illustrated in Figure 31. This interface provides a graphical view of Solaris Volume Manager components, including volumes, hot spare pools, and state database replicas. This interface offers wizard-based manipulation of Solaris Volume Manager components, enabling you to quickly congure your disks or change an existing conguration. The command lineYou can use several commands to perform volume management functions. The Solaris Volume Manager core commands begin with meta for example the metainit and metastat commands. For a list of Solaris Volume Manager commands, see Appendix B.
36
Note Do not attempt to administer Solaris Volume Manager with the command line and the graphical user interface at the same time. Conicting changes could be made to the conguration, and the behavior would be unpredictable. You can use both tools to administer Solaris Volume Manager, but not concurrently.
FIGURE 31
View of the Enhanced Storage tool (Solaris Volume Manager) in the Solaris Management Console
% /usr/sbin/smc
2. Double-click This Computer. 3. Double-click Storage. 4. Double-click Enhanced Storage to load the Solaris Volume Manager tools. 5. If prompted to log in, log in as root or as a user who has equivalent access. 6. Double-click the appropriate icon to manage volumes, hot spare pools, state database replicas, and disk sets.
Tip All tools in the Solaris Management Console display information in the bottom section of the page or at the left side of a wizard panel. Choose Help at any time to nd additional information about performing tasks in this interface.
You must have root privilege to administer Solaris Volume Manager. Equivalent privileges granted through the User Prole feature in the Solaris Management Console allow administration through the Solaris Management Console. However, only the root user can use the Solaris Volume Manager command-line interface. Before you can create volumes with Solaris Volume Manager, state database replicas must exist on the Solaris Volume Manager system. At least three replicas should exist, and the replicas should be placed on different controllers and different disks for maximum reliability. See About the Solaris Volume Manager State Database and Replicas on page 53 for more information about state database replicas, and Creating State Database Replicas on page 62 for instructions on how to create state database replicas.
TABLE 31
RAID 0 volumes (stripe, A group of physical slices concatenation, that appear to the system concatenated stripe), RAID as a single, logical device 1 (mirror) volumes, RAID 5 volumes Soft partitions
Volumes on page 39
Subdivisions of physical To improve manageability slices or logical volumes to of large storage volumes. provide smaller, more manageable storage units A database that stores information on disk about the state of your Solaris Volume Manager conguration A collection of slices (hot spares) reserved to be automatically substituted in case of component failure in either a submirror or RAID 5 volume A set of shared disk drives in a separate namespace that contain volumes and hot spares and that can be non-concurrently shared by multiple hosts Solaris Volume Manager cannot operate until you have created the state database replicas. State Database and State Database Replicas on page 43
To increase data Hot Spare Pools availability for RAID 1 and on page 44 RAID 5 volumes.
Disk set
To provide data redundancy and availability and to provide a separate namespace for easier administration.
Volumes
A volume is a name for a group of physical slices that appear to the system as a single, logical device. Volumes are actually pseudo, or virtual, devices in standard UNIX terms.
Note Historically, the Solstice DiskSuite product referred to these logical devices as metadevices. However, for simplicity and standardization, this book refers to these devices as volumes.
39
Classes of Volumes
You create a volume as a RAID 0 (concatenation or stripe) volume, a RAID 1 (mirror) volume, a RAID 5 volume, a soft partition, or a transactional logging volume. You can use either the Enhanced Storage tool within the Solaris Management Console or the command-line utilities to create and administer volumes. The following table summarizes the classes of volumes:
TABLE 32 Volume
Classes of Volumes
Description
Can be used directly, or as the basic building blocks for mirrors and transactional devices. RAID 0 volumes do not directly provide data redundancy. Replicates data by maintaining multiple copies. A RAID 1 volume is composed of one or more RAID 0 volumes that are called submirrors. Replicates data by using parity information. In the case of disk failure, the missing data can be regenerated by using available data and the parity information. A RAID 5 volume is generally composed of slices. One slices worth of space is allocated to parity information, but the parity is distributed across all slices in the RAID 5 volume. Used to log a UFS le system. (UFS logging is a preferable solution to this need, however.) A transactional volume is composed of a master device and a logging device. Both of these devices can be a slice, RAID 0 volume, RAID 1 volume, or RAID5 volume. The master device contains the UFS le system. Divides a slice or logical volume into one or more smaller, extensible volumes.
Transactional
Soft partition
40
Disk A c1t1d0s2
c1t1d0s2 c2t2d0s2 d0
Disk B c2t2d0s2
FIGURE 32
Applications and databases that use the raw volume must have their own method to grow the added space so applications can recognize it. Solaris Volume Manager does not provide this capability.
Chapter 3 Solaris Volume Manager Overview 41
You can expand the disk space in volumes in the following ways:
I I I I
Adding one or more slices to a RAID 0 volume Adding a slice or multiple slices to all submirrors of a RAID 1 volume Adding one or more slices to a RAID 5 volume Expanding a soft partition with additional space from the underlying component
Volume Names
Volume Name Requirements
There are a few rules that you must follow when assigning names for volumes:
I
Volume names must begin with the letter d followed by a number (for example, d0). Instead of specifying the full volume name, such as /dev/md/dsk/d1, you can often use an abbreviated volume name, such as d1, with any meta* command. Like physical slices, volumes have logical names that appear in the le system. Logical volume names have entries in the /dev/md/dsk directory for block devices and the /dev/md/rdsk directory for raw devices. You can generally rename a volume, as long as the volume is not currently being used and the new name is not being used by another volume. For more information, see Exchanging Volume Names on page 233. Solaris Volume Manager has 128 default volume names from 0127. The following table shows some example volume names.
42
TABLE 33
Use ranges for each particular type of volume. For example, assign numbers 020 for RAID 1 volumes, 2140 for RAID 0 volumes, and so on. Use a naming relationship for mirrors. For example, name mirrors with a number that ends in zero (0), and submirrors that end in one (1) and two (2). For example, you might name mirrors as follows: mirror d10, submirrors d11 and d12; mirror d20, submirrors d21 and d22, and so on. Use a naming method that maps the slice number and disk number to volume numbers.
Solaris Volume Manager recognizes when a slice contains a state database replica, and automatically skips over the replica if the slice is used in a volume. The part of a slice reserved for the state database replica should not be used for any other purpose. You can keep more than one copy of a state database on one slice. However, you might make the system more vulnerable to a single point-of-failure by doing so. The system continues to function correctly if all state database replicas are deleted. However, the system loses all Solaris Volume Manager conguration data if a reboot occurs with no existing state database replicas on disk.
Disk Sets
A shared disk set, or simply disk set, is a set of disks that contain state database replicas, volumes, and hot spares. This pool can be shared exclusively but not concurrently by multiple hosts. A disk set provides for data availability in a clustered environment. If one host fails, another host can take over the failed hosts disk set. (This type of conguration is known as a failover conguration.) Additionally, disk sets can be used to help manage the Solaris Volume Manager name space, and to provide ready access to network-attached storage devices. For more information, see Chapter 19.
44
General Guidelines
I
Disk and controllersPlace drives in a volume on separate drive paths. For SCSI drives, this means separate host adapters. An I/O load distributed over several controllers improves volume performance and availability. System lesNever edit or remove the /etc/lvm/mddb.cf or /etc/lvm/md.cf les. Make sure these les are backed up on a regular basis. Volume IntegrityIf a slice is dened as a volume, do not use the underlying slice for any other purpose, including using the slice as a dump device. Maximum volumesThe maximum number of volumes that are supported in a disk set is 8192 (but the default number of volumes is 128). To increase the number of default volumes, edit the /kernel/drv/md.conf le. See System Files and Startup Files on page 303 for more information on this le. Information about disks and partitionsHave a copy of output from the prtvtoc and metastat -p command in case you need to reformat a bad disk or re-create your Solaris Volume Manager conguration.
Do not mount le systems on a volumes underlying slice. If a slice is used for a volume of any kind, you must not mount that slice as a le system. If possible, unmount any physical device that you intend to use as a volume before you activate the volume. For example, if you create a transactional volume for a UFS, in the /etc/vfstab le, you would specify the transactional volume name as the device to mount and fsck.
45
State database replicas Volumes (RAID 0 (stripes, concatenations), RAID 1 (mirrors), RAID 5, soft partitions, and transactional volumes) Hot spare pools Disk sets
I I
Note For suggestions on how to name volumes, see Volume Names on page 42.
Create initial state database replicas. If you have not done so, see Creating State Database Replicas on page 62. Identify slices that are available for use by Solaris Volume Manager. If necessary, use the format command, the fmthard command, or the Solaris Management Console to repartition existing disks. Make sure you have root privilege. Have a current backup of all data. If you are using the graphical user interface, start the Solaris Management Console and maneuver through the interface to use the Solaris Volume Manager feature. For information, see How to Access the Solaris Volume Manager Graphical User Interface on page 37.
I I I
46
Solaris Volume Manager allows system administrators to do the following: 1. Create, modify, and delete logical volumes built on or from logical storage units (LUNs) greater than 1 TB in size. 2. Create, modify, and delete logical volumes that exceed 1 TB in size. Support for large volumes is automaticif a device greater than 1 TB is created, Solaris Volume Manager congures it appropriately and without user intervention.
If a system with large volumes is rebooted under a 32bit Solaris 9 4/03 or later kernel, the large volumes will be visible through metastat output, but they cannot be accessed, modied or deleted, and no new large volumes can be created. Any volumes or le systems on a large volume in this situation will also be unavailable. If a system with large volumes is rebooted under a release of Solaris prior to Solaris 9 4/03, Solaris Volume Manager will not start. All large volumes must be removed before Solaris Volume Manager will run under another version of the Solaris Operating Environment. Solaris Volume Manager transactional volumes do not support large volumes. In all cases, UFS logging (see mount_ufs(1M)) provides better performance than transactional volumes, and UFS logging does support large volumes as well.
47
Caution Do not create large volumes if you expect to run the Solaris Operating Environment with a 32bit kernel or if you expect to use a version of the Solaris Operating Environment prior to Solaris 9 4/03.
48
CHAPTER
Scenario Background Information on page 49 Complete Solaris Volume Manager Conguration on page 51
Hardware Conguration
The hardware system is congured as follows:
I I
There are 3 physically separate controllers (c0 IDE, c1 SCSI, and c2 SCSI). Each SCSI controller connects to a MultiPack that contains 6 internal 9Gbyte disks (c1t1 through c1t6 and c2t1 through c2t6). Each controller/terminator pair (cntn) has 8.49 Gbytes of usable storage space. Storage space on the root (/) drive c0t0d0 is split into 6 partitions.
I I
c0t0d0 c0
The SCSI controller/terminator pairs (cntn) have approximately 20 Gbytes of storage space Storage space on each disk (for example, c1t1d0) is split into 7 partitions (cntnd0s0 through cntnd0s6). To partition a disk, follow the procedures explained in Formatting a Disk in System Administration Guide: Basic Administration.
50
51
52
CHAPTER
About the Solaris Volume Manager State Database and Replicas on page 53 Understanding the Majority Consensus Algorithm on page 55 Background Information for Dening State Database Replicas on page 56 Handling State Database Replica Errors on page 57
consensus algorithm that you must create at least three state database replicas when you set up your disk conguration. A consensus can be reached as long as at least two of the three state database replicas are available. During booting, Solaris Volume Manager ignores corrupted state database replicas. In some cases, Solaris Volume Manager tries to rewrite state database replicas that are corrupted. Otherwise, they are ignored until you repair them. If a state database replica becomes corrupted because its underlying slice encountered an error, you will need to repair or replace the slice and then enable the replica.
Caution Do not place state database replicas on fabric-attached storage, SANs, or other storage that is not directly attached to the system. Replicas must be on storage devices that are available at the same point in the boot process as traditional SCSI or IDE drives.
If all state database replicas are lost, you could, in theory, lose all data that is stored on your Solaris Volume Manager volumes. For this reason, it is good practice to create enough state database replicas on separate drives and across controllers to prevent catastrophic failure. It is also wise to save your initial Solaris Volume Manager conguration information, as well as your disk partition information. See Chapter 6 for information on adding additional state database replicas to the system, and on recovering when state database replicas are lost. State database replicas are also used for RAID 1 volume resynchronization regions. Too few state database replicas relative to the number of mirrors might cause replica I/O to impact RAID 1 volume performance. That is, if you have a large number of mirrors, make sure that you have a total of at least two state database replicas per RAID 1 volume, up to the maximum of 50 replicas per disk set. Each state database replica occupies 4 Mbytes (8192 disk sectors) of disk storage by default. Replicas can be stored on the following devices:
I I I
a dedicated local disk partition a local partition that will be part of a volume a local partition that will be part of a UFS logging device
Note Replicas cannot be stored on the root (/), swap, or /usr slices, or on slices that contain existing le systems or data. After the replicas have been stored, volumes or le systems can be placed on the same slice.
54
Note Replicas cannot be stored on fabric-attached storage, SANs, or other storage that is not directly attached to the system. Replicas must be on storage devices that are available at the same point in the boot process as traditional SCSI or IDE drives.
The system will stay running if at least half of the state database replicas are available. The system will panic if fewer than half of the state database replicas are available. The system will not reboot into multiuser mode unless a majority (half + 1) of the total number of state database replicas is available.
I I
If insufficient state database replicas are available, you will have to boot into single-user mode and delete enough of the bad or missing replicas to achieve a quorum. See How to Recover From Insufficient State Database Replicas on page 291.
Note When the number of state database replicas is odd, Solaris Volume Manager computes the majority by dividing the number in half, rounding down to the nearest integer, then adding 1 (one). For example, on a system with seven replicas, the majority would be four (seven divided by two is three and one-half, rounded down is three, plus one is four).
55
You should create state database replicas on a dedicated slice of at least 4 Mbytes per replica. If necessary, you could create state database replicas on a slice that will be used as part of a RAID 0, RAID 1, or RAID 5 volume, soft partitions, or transactional (master or log) volumes. You must create the replicas before you add the slice to the volume. Solaris Volume Manager reserves the starting part of the slice for the state database replica. You can create state database replicas on slices that are not in use. You cannot create state database replicas on existing le systems, or the root (/), /usr, and swap le systems. If necessary, you can create a new slice (provided a slice name is available) by allocating space from swap and then put state database replicas on that new slice. A minimum of 3 state database replicas are recommended, up to a maximum of 50 replicas per Solaris Volume Manager disk set. The following guidelines are recommended:
I I I
I I
For a system with only a single drive: put all three replicas in one slice. For a system with two to four drives: put two replicas on each drive. For a system with ve or more drives: put one replica on each drive.
If you have a RAID 1 volume that will be used for small-sized random I/O (as in for a database), be sure that you have at least two extra replicas per RAID 1 volume on slices (and preferably disks and controllers)that are unconnected to the RAID 1 volume for best performance.
56
You can add additional state database replicas to the system at any time. The additional state database replicas help ensure Solaris Volume Manager availability.
Caution If you upgraded from Solstice DiskSuite to Solaris Volume Manager and you have state database replicas sharing slices with le systems or logical volumes (as opposed to on separate slices), do not delete the existing replicas and replace them with new replicas in the same location.
The default state database replica size in Solaris Volume Manager is 8192 blocks, while the default size in Solstice DiskSuite was 1034 blocks. If you delete a default-sized state database replica from Solstice DiskSuite, then add a new default-sized replica with Solaris Volume Manager, you will overwrite the rst 7158 blocks of any le system that occupies the rest of the shared slice, thus destroying the data.
When a state database replica is placed on a slice that becomes part of a volume, the capacity of the volume is reduced by the space that is occupied by the replica(s). The space used by a replica is rounded up to the next cylinder boundary and this space is skipped by the volume. By default, the size of a state database replica is 4 Mbytes or 8192 disk blocks. Because your disk slices might not be that small, you might want to resize a slice to hold the state database replica. For information about resizing a slice, see Administering Disks (Tasks) in System Administration Guide: Basic Administration. If multiple controllers exist, replicas should be distributed as evenly as possible across all controllers. This strategy provides redundancy in case a controller fails and also helps balance the load. If multiple disks exist on a controller, at least two of the disks on each controller should store a replica.
57
For example, assume you have four replicas. The system will stay running as long as two replicas (half the total number) are available. However, to reboot the system, three replicas (half the total plus one) must be available. In a two-disk conguration, you should always create at least two replicas on each disk. For example, assume you have a conguration with two disks, and you only create three replicas (two replicas on the rst disk and one replica on the second disk). If the disk with two replicas fails, the system will panic because the remaining disk only has one replica and this is less than half the total number of replicas.
Note If you create two replicas on each disk in a two-disk conguration, Solaris Volume Manager will still function if one disk fails. But because you must have one more than half of the total replicas available for the system to reboot, you will be unable to reboot.
What happens if a slice that contains a state database replica fails? The rest of your conguration should remain in operation. Solaris Volume Manager nds a valid state database during boot (as long as there are at least half plus one valid state database replicas). What happens when state database replicas are repaired? When you manually repair or enable state database replicas, Solaris Volume Manager updates them with valid data.
58
A minimal conguration could put a single state database replica on slice 7 of the root disk, then an additional replica on slice 7 of one disk on each of the other two controllers. To help protect against the admittedly remote possibility of media failure, using two replicas on the root disk and then two replicas on two different disks on each controller, for a total of six replicas, provides more than adequate security. To round out the total, add 2 additional replicas for each of the 6 mirrors, on different disks than the mirrors. This conguration results in a total of 18 replicas with 2 on the root disk and 8 on each of the SCSI controllers, distributed across the disks on each controller.
59
60
CHAPTER
Task
Description
Instructions
Create state database replicas Check the status of state database replicas Delete state database replicas.
Use the Solaris Volume Manager GUI How to Create State or the metadb -a command to create Database Replicas state database replicas. on page 62 Use the Solaris Volume Manager GUI or the metadb command to check the status of existing replicas. How to Check the Status of State Database Replicas on page 64
Use the Solaris Volume Manager GUI How to Delete State or the metadb -d command to delete Database Replicas state database replicas. on page 65
61
The default state database replica size in Solaris Volume Manager is 8192 blocks, while the default size in Solstice DiskSuite was 1034 blocks. If you delete a default-sized state database replica from Solstice DiskSuite, and then add a new default-sized replica with Solaris Volume Manager, you will overwrite the rst 7158 blocks of any le system that occupies the rest of the shared slice, thus destroying the data.
Caution Do not replace default-sized (1034 block) state database replicas from Solstice DiskSuite with default-sized Solaris Volume Manager replicas (8192 blocks) on a slice shared with a le system. If you do, the new replicas will overwrite the beginning of your le system and corrupt it.
Caution Do not place state database replicas on fabric-attached storage, SANs, or other storage that is not directly attached to the system. Replicas must be on storage devices that are available at the same point in the boot process as traditional SCSI or IDE drives.
From the Enhanced Storage tool within the Solaris Management Console, open the State Database Replicas node. Choose Action->Create Replicas and follow the instructions. For more information, see the online help. Use the following form of the metadb command. See the metadb(1M) man page for more information.
metadb -a -c n -l nnnn -f ctds-of-slice
62
I I I I I
-a species to add a state database replica. -f species to force the operation, even if no replicas exist. -c n species the number of replicas to add to the specied slice. -l nnnn species the size of the new replicas, in blocks. ctds-of-slice species the name of the component that will hold the replica.
The -a option adds the additional state database replica to the system, and the -f option forces the creation of the rst replica (and may be omitted when you add supplemental replicas to the system).
The -a option adds additional state database replicas to the system. The -c 2 option places two replicas on the specied slice. The metadb command checks that the replicas are active, as indicated by the -a. You can also specify the size of the state database replica with the -l option, followed by the number of blocks. However, the default size of 8192 should be appropriate for virtually all congurations, including those congurations with thousands of logical volumes.
# metadb -a -c 3 -l 1034 c0t0d0s7 # metadb flags first blk ... a u 16 a u 1050 a u 2084
The -a option adds the additional state database replica to the system, and the -l option species the length in blocks of the replica to add.
From the Enhanced Storage tool within the Solaris Management Console, open the State Database Replicas node to view all existing state database replicas. For more information, see the online help. Use the metadb command to view the status of state database replicas. Add the -i option to display a key to the status ags, as shown in the following example. See the metadb(1M) man page for more information.
p m W a M D F S R
replicas location was patched in kernel replica is master, this is replica selected as input replica has device write errors replica is active, commits are occurring to this replica replica had problem with master blocks replica had problem with data blocks replica had format problems replica is too small to hold current data base replica had device read errors
A legend of all the ags follows the status. The characters in front of the device name represent the status. Uppercase letters indicate a problem status. Lowercase letters indicate an Okay status.
From the Enhanced Storage tool within the Solaris Management Console, open the State Database Replicas node to view all existing state database replicas. Select replicas to delete, then choose Edit->Delete to remove them. For more information, see the online help. Use the following form of the metadb command:
metadb -d -f ctds-of-slice
I I I
-d species to delete a state database replica. -f species to force the operation, even if no replicas exist. ctds-of-slice species the name of the component that holds the replica.
Note that you need to specify each slice from which you want to remove the state database replica. See the metadb(1M) man page for more information.
This example shows the last replica being deleted from a slice. You must add a -f option to force deletion of the last replica on the system.
65
66
CHAPTER
Overview of RAID 0 Volumes on page 67 Background Information for Creating RAID 0 Volumes on page 74 ScenarioRAID 0 Volumes on page 75
Striped volumes (or stripes) Concatenated volumes (or concatenations) Concatenated striped volumes (or concatenated stripes)
Note A component refers to any devices, from slices to soft partitions, used in another
logical volume.
A stripe spreads data equally across all components in the stripe, while a concatenated volume writes data to the rst available component until it is full, then moves to the next available component. A concatenated stripe is simply a stripe that has been expanded from its original conguration by adding additional components.
67
RAID 0 volumes allow you to quickly and simply expand disk storage capacity. The drawback is that these volumes do not provide any data redundancy, unlike RAID 1 or RAID 5 volumes. If a single component fails on a RAID 0 volume, data is lost. You can use a RAID 0 volume containing a single slice for any le system. You can use a RAID 0 volume that contains multiple components for any le system except the following:
I I I I I I
root (/) /usr swap /var /opt Any le system that is accessed during an operating system upgrade or installation
Note When you mirror root (/), /usr, swap, /var, or /opt, you put the le system
into a one-way concatenation or stripe (a concatenation of a single slice) that acts as a submirror. This one-way concatenation is mirrored by another submirror, which must also be a concatenation.
When you create a stripe, you can set the interlace value or use the Solaris Volume Manager default interlace value of 16 Kbytes. Once you have created the stripe, you cannot change the interlace value. However, you could back up the data on it, delete the stripe, create a new stripe with a new interlace value, and then restore the data.
69
interlace 1 Physical Slice A interlace 4 RAID 0 (Stripe) Volume interlace 1 interlace 2 Physical Slice B interlace 5 Solaris Volume Manager interlace 2 interlace 3 interlace 4 interlace 5 interlace 6 interlace 3 Physical Slice C interlace 6
FIGURE 71
A concatenation can also expand any active and mounted UFS le system without having to bring down the system. In general, the total capacity of a concatenation is equal to the total size of all the components in the concatenation. If a concatenation contains a slice with a state database replica, the total capacity of the concatenation would be the sum of the components minus the space that is reserved for the replica.
70 Solaris Volume Manager Administration Guide April 2004
You can also create a concatenation from a single component. Later, when you need more storage, you can add more components to the concatenation.
Note You must use a concatenation to encapsulate root (/), swap, /usr, /opt, or
ScenarioRAID 0 (Concatenation)
Figure 72 illustrates a concatenation that is made of three components (slices). The data blocks, or chunks, are written sequentially across the components, beginning with Disk A. Disk A can be envisioned as containing logical chunks 1 through 4. Logical chunk 5 would be written to Disk B, which would contain logical chunks 5 through 8. Logical chunk 9 would be written to Drive C, which would contain chunks 9 through 12. The total capacity of volume d1 would be the combined capacities of the three drives. If each drive were 2 Gbytes, volume d1 would have an overall capacity of 6 Gbytes.
interlace 1 Physical Slice A interlace 2 interlace 3 interlace 4 interlace 5 Physical Slice B interlace 6 interlace 7 interlace 8 interlace 9 Physical Slice C interlace 10 interlace 11 interlace 12
RAID 0 (Concatenation) Example
RAID 0 (Concatenation) Volume interlace 1 interlace 2 Solaris Volume Manager ... ... ... interlace 12
FIGURE 72
71
72
interlace 1 Physical Slice A interlace 4 interlace 7 interlace 10 interlace 2 Stripe Physical Slice B interlace 5 interlace 8 interlace 11 interlace 3 Physical Slice C interlace 6 interlace 9 interlace 12 interlace 13 Physical Slice D Stripe interlace 14 Physical Slice E interlace 16 interlace 18 interlace 20 interlace 21 Physical Slice F Stripe interlace 22 Physical Slice G interlace 24 interlace 26 interlace 28 interlace 23 interlace 25 interlace 27 interlace 15 interlace 17 interlace 19 Solaris Volume Manager RAID 0 Volume interlace 1 interlace 2 interlace 3 interlace 4 interlace 5 interlace 6 interlace 7 interlace 8 interlace 9 interlace 10 ... interlace 28
FIGURE 73
73
Use components that are each on different controllers to increase the number of simultaneous reads and writes that can be performed. Do not create a stripe from an existing le system or data. Doing so will destroy data. Instead, use a concatenation. (You can create a stripe from existing data, but you must dump and restore the data to the volume.) Use the same size disk components for stripes. Striping different sized components results in wasted disk space. Set up a stripes interlace value to better match the I/O requests made by the system or applications. Because a stripe or concatenation does not contain replicated data, when such a volume has a component failure you must replace the component, re-create the stripe or concatenation, and restore data from a backup. When you recreate a stripe or concatenation, use a replacement component that has at least the same size as the failed component.
Concatenation uses less CPU cycles than striping and performs well for small random I/O and for even I/O distribution. When possible, distribute the components of a stripe or concatenation across different controllers and busses. Using stripes that are each on different controllers increases the number of simultaneous reads and writes that can be performed. If a stripe is dened on a failing controller and there is another available controller on the system, you can move the stripe to the new controller by moving the disks to the controller and redening the stripe. Number of stripes: Another way of looking at striping is to rst determine the performance requirements. For example, you might need 10.4 Mbyte/sec performance for a selected application, and each disk might deliver approximately 4 Mbyte/sec. Based on this formula, then determine how many disk spindles you need to stripe across: 10.4 Mbyte/sec / 4 Mbyte/sec = 2.6
74
ScenarioRAID 0 Volumes
RAID 0 volumes provide the fundamental building blocks for aggregating storage or building mirrors. The following example, drawing on the sample system explained in Chapter 4, describes how RAID 0 volumes can provide larger storage spaces and allow you to construct a mirror of existing le systems, including root (/). The sample system has a collection of relatively small (9 Gbyte) disks, and it is entirely possible that specic applications would require larger storage spaces. To create larger spaces (and improve performance), the system administrator can create a stripe that spans multiple disks. For example, each of c1t1d0, c1t2d0, c1t3d0 and c2t1d0, c2t2d0, c2t3d0 could be formatted with a slice 0 that spans the entire disk. Then, a stripe including all three of the disks from the same controller could provide approximately 27Gbytes of storage and allow faster access. The second stripe, from the second controller, can be used for redundancy, as described in Chapter 10 and specically in the ScenarioRAID 1 Volumes (Mirrors) on page 96.
75
76
CHAPTER
Task
Description
Instructions
Use the metainit command to create How to Create a RAID 0 a new volume. (Stripe) Volume on page 78
Create RAID 0 Use the metainit command to create How to Create a RAID 0 (concatenation) volumes a new volume. (Concatenation) Volume on page 79 Expand storage space Use the metainit command to expand an existing le system. Use the metattach command to expand an existing volume. Use the metaclear command to delete a volume. How to Expand Storage Space for Existing Data on page 81 How to Expand an Existing RAID 0 Volume on page 82 How to Remove a RAID 0 Volume on page 84
77
Caution Do not create volumes larger than 1TB if you expect to run the Solaris Operating Environment with a 32bit kernel or if you expect to use a version of the Solaris Operating Environment prior to Solaris 9 4/03. See Overview of Large Volume Support in Solaris Volume Manager on page 47 for more information about large volume support in Solaris Volume Manager.
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node. Choose Action->Create Volume, then follow the instructions in the wizard. For more information, see the online help. Use the following form of the metainit command: metainit {volume-name} {number-of-stripes} {components-per-stripe} {component-names} [-i interlace-value]
I I I
volume-name is the name of the volume to create. number-of-stripes species the number of stripes to create. components-per-stripe species the number of components each stripe should have. component-names species the names of the components that will be used. -iwidth species the interlace width to use for the stripe.
I I
See the following examples and the metainit(1M) man page for more information.
78
The stripe, d20, consists of a single stripe (the number 1) that is made of three slices (the number 3). Because no interlace value is specied, the stripe uses the default of 16 Kbytes. The system conrms that the volume has been set up.
ExampleCreating a RAID 0 (Stripe) Volume of Two Slices With a 32Kbyte Interlace Value
# metainit d10 1 2 c0t1d0s2 c0t2d0s2 -i 32k d10: Concat/Stripe is setup
The stripe, d10, consists of a single stripe (the number 1) that is made of two slices (the number 2). The -i option sets the interlace value to 32 Kbytes. (The interlace value cannot be less than 8 Kbytes, nor greater than 100 Mbytes.) The system veries that the volume has been set up.
79
1. Check Prerequisites for Creating Solaris Volume Manager Components on page 46 and Background Information for Creating RAID 0 Volumes on page 74. 2. To create the concatenation use one of the following methods:
I
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node. Choose Action->Create Volume, then follow the instructions in the wizard. For more information, see the online help. Use the following form of the metainit command: metainit {volume-name} {number-of-stripes} { [components-per-stripe] | [component-names]}
I I I
volume-name is the name of the volume to create. number-of-stripes species the number of stripes to create. components-per-stripe species the number of components each stripe should have. component-names species the names of the components that will be used.
For more information, see the following examples and the metainit(1M) man page.
This example shows the creation of a concatenation, d25, that consists of one stripe (the rst number 1) made of a single slice (the second number 1 in front of the slice). The system veries that the volume has been set up. This example shows a concatenation that can safely encapsulate existing data.
This example creates a concatenation called d40 that consists of four stripes (the number 4), each made of a single slice (the number 1 in front of each slice). The system veries that the volume has been set up.
1. Check Prerequisites for Creating Solaris Volume Manager Components on page 46 and Background Information for Creating RAID 0 Volumes on page 74. 2. Unmount the le system.
# umount /lesystem
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node. Choose Action->Create Volume, then follow the instructions in the wizard. For more information, see the online help. Use the following form of the metainit command: metainit {volume-name} {number-of-stripes} { [components-per-stripe] | [component-names]}
I I I
volume-name is the name of the volume to create. number-of-stripes species the number of stripes to create. components-per-stripe species the number of components each stripe should have. components species the names of the components that will be used.
For more information, see the metainit(1M) man page. 4. Edit the /etc/vfstab le so that the le system references the name of the concatenation. 5. Remount the le system.
Chapter 8 RAID 0 (Stripe and Concatenation) Volumes (Tasks) 81
# mount /lesystem
Note that the rst slice in the metainit command must be the slice that contains the le system. If not, you will corrupt your data. Next, the entry for the le system in the /etc/vfstab le is changed (or entered for the rst time) to reference the concatenation. For example, the following line:
/dev/dsk/c0t1d0s2 /dev/rdsk/c0t1d0s2 /docs ufs 2 yes -
82
Caution Do not create volumes larger than 1TB if you expect to run the Solaris Operating Environment with a 32bit kernel or if you expect to use a version of the Solaris Operating Environment prior to Solaris 9 4/03. See Overview of Large Volume Support in Solaris Volume Manager on page 47 for more information about large volume support in Solaris Volume Manager.
1. Check Prerequisites for Creating Solaris Volume Manager Components on page 46 and Background Information for Creating RAID 0 Volumes on page 74. 2. To create a concatenated stripe, use one of the following methods:
I
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node. Choose Action->Create Volume, then follow the instructions in the wizard. For more information, see the online help. To concatenate existing stripes from the command line, use the following form of the metattach command: metattach {volume-name} {component-names}
I I
volume-name is the name of the volume to expand. components species the names of the components that will be used.
See ExampleCreating a Concatenated Stripe By Attaching a Single Slice on page 83, ExampleCreating a Concatenated Stripe By Adding Several Slices on page 83, and the metattach(1M) man page for more information.
This example illustrates how to attach a slice to an existing stripe, d2. The system conrms that the slice is attached.
This example takes an existing three-way stripe, d25, and concatenates another three-way stripe. Because no interlace value is given for the attached slices, they inherit the interlace value congured for d25. The system veries that the volume has been set up.
Chapter 8 RAID 0 (Stripe and Concatenation) Volumes (Tasks) 83
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node. Choose Edit->Delete, then follow the instructions. For more information, see the online help. Use the following format of the metaclear command to delete the volume: metaclear {volume-name} See the following example and the metaclear(1M) man page for more information.
ExampleRemoving a Concatenation
# umount d8 # metaclear d8 d8: Concat/Stripe is cleared 84 Solaris Volume Manager Administration Guide April 2004
This example illustrated clearing the concatenation d8 that also contains a mounted le system. The le system must be unmounted before the volume can be cleared. The system displays a conrmation message that the concatenation is cleared. If there is an entry in the /etc/vfstab le for this volume, delete that entry. You do not want to confuse the system by asking it to mount a le system on a nonexistent volume.
85
86
CHAPTER
Overview of RAID 1 (Mirror) Volumes on page 87 RAID 1 Volume (Mirror) Resynchronization on page 93 Background Information for RAID 1 Volumes on page 94 How Booting Into Single-User Mode Affects RAID 1 Volumes on page 96
87
If you have no existing data that you are mirroring and you are comfortable destroying all data on all submirrors, you can speed the creation process by creating all submirrors with a single command.
Overview of Submirrors
The RAID 0 volumes that are mirrored are called submirrors. A mirror is made of one or more RAID 0 volumes (stripes or concatenations). A mirror can consist of up to four submirrors. Practically, a two-way mirror is usually sufficient. A third submirror enables you to make online backups without losing data redundancy while one submirror is offline for the backup. If you take a submirror offline, the mirror stops reading and writing to the submirror. At this point, you could access the submirror itself, for example, to perform a backup. However, the submirror is in a read-only state. While a submirror is offline, Solaris Volume Manager keeps track of all writes to the mirror. When the submirror is brought back online, only the portions of the mirror that were written while the submirror was offline (resynchronization regions) are resynchronized. Submirrors can also be taken offline to troubleshoot or repair physical devices which have errors. Submirrors can be attached or detached from a mirror at any time, though at least one submirror must remain attached at all times. Normally, you create a mirror with only a single submirror. Then, you attach a second submirror after you create the mirror.
88
d21 interlace 1 interlace 2 interlace 3 interlace 4 d22 interlace 1 interlace 2 interlace 3 interlace 4 Solaris Volume Manager RAID 1 Volume d20 interlace 1 interlace 2 interlace 3 interlace 4
FIGURE 91
For example, with a pure RAID 0+1 implementation and a two-way mirror that consists of three striped slices, a single slice failure could fail one side of the mirror. And, assuming that no hot spares were in use, a second slice failure would fail the mirror. Using Solaris Volume Manager, up to three slices could potentially fail without failing the mirror, because each of the three striped slices are individually mirrored to their counterparts on the other half of the mirror. Consider this example:
89
(Submirror 1)
Physical Slice A
Physical Slice B
Physical Slice C
RAID 1 Volume
(Submirror 2)
Physical Slice D
Physical Slice E
Physical Slice F
FIGURE 92
RAID 1+ 0 Example
Mirror d1 consists of two submirrors, each of which consists of three identical physical disks and the same interlace value. A failure of three disks, A, B, and F can be tolerated because the entire logical block range of the mirror is still contained on at least one good disk. If, however, disks A and D fail, a portion of the mirrors data is no longer available on any disk and access to these logical blocks will fail. When a portion of a mirrors data is unavailable due to multiple slice errors, access to portions of the mirror where data is still available will succeed. Under this situation, the mirror acts like a single disk that has developed bad blocks. The damaged portions are unavailable, but the rest is available.
When creating a RAID 1 volume from an existing le system built on a slice, only the single slice may be included in the primary RAID 0 volume (submirror). If you are mirroring root or other system-critical le systems, all submirrors must consist of only a single slice. Keep the slices of different submirrors on different disks and controllers. Data protection is diminished considerably if slices of two or more submirrors of the same mirror are on the same disk. Likewise, organize submirrors across separate controllers, because controllers and associated cables tends to fail more often than disks. This practice also improves mirror performance. Use the same type of disks and controllers in a single mirror. Particularly in old SCSI storage devices, different models or brands of disk or controller can have widely varying performance. Mixing the different performance levels in a single mirror can cause performance to degrade signicantly. Use the same size submirrors. Submirrors of different sizes result in unused disk space.
90
Only mount the mirror device directly. Do not try to mount a submirror directly, unless it is offline and mounted read-only. Do not mount a slice that is part of a submirror. This process could destroy data and crash the system. Mirroring might improve read performance, but write performance is always degraded. Mirroring improves read performance only in threaded or asynchronous I/O situations. No performance gain results if there is only a single thread reading from the volume. Experimenting with the mirror read policies can improve performance. For example, the default read mode is to alternate reads in a round-robin fashion among the disks. This policy is the default because it tends to work best for UFS multiuser, multiprocess activity. In some cases, the geometric read option improves performance by minimizing head motion and access time. This option is most effective when there is only one slice per disk, when only one process at a time is using the slice/le system, and when I/O patterns are highly sequential or when all accesses are read. To change mirror options, see How to Change RAID 1 Volume Options on page 115.
Use the swap -l command to check for all swap devices. Each slice that is specied as swap must be mirrored independently from the remaining swap slices. Use only similarly congured submirrors within a mirror. In particular, if you create a mirror with an unlabeled submirror, you will be unable to attach any submirrors that contain disk labels.
Note If you have a mirrored le system in which the rst submirror attached does not start on cylinder 0, all additional submirrors you attach must also not start on cylinder 0. If you attempt to attach a submirror starting on cylinder 0 to a mirror in which the original submirror does not start on cylinder 0, the following error message displays:
cant attach labeled submirror to an unlabeled mirror
You must ensure that all submirrors intended for use within a specic mirror either all start on cylinder 0, or that none of them start on cylinder 0. Starting cylinders do not have to be the same across all submirrors, but all submirrors must either include or not include cylinder 0.
Mirror read policy Mirror write policy The order in which mirrors are resynchronized (pass number)
91
You can dene mirror options when you initially create the mirror, or after a mirror has been set up. For tasks related to changing these options, see How to Change RAID 1 Volume Options on page 115.
Read Policy
Attempts to balance the load across the submirrors. All reads are made in a round-robin order (one after another) from all submirrors in a mirror. Enables reads to be divided among submirrors on the basis of a logical disk block address. For instance, with a two-way submirror, the disk space on the mirror is divided into two equally-sized logical address ranges. Reads from one submirror are restricted to one half of the logical range, and reads from the other submirror are restricted to the other half. The geometric read policy effectively reduces the seek time necessary for reads. The performance gained by this mode depends on the system I/O load and the access patterns of the applications. Directs all reads to the rst submirror. This policy should be used only when the device or devices that comprise the rst submirror are substantially faster than those of the second submirror.
First
TABLE 92
Write Policy
A write to a mirror is replicated and dispatched to all of the submirrors simultaneously. Performs writes to submirrors serially (that is, the rst submirror write completes before the second is started). The serial option species that writes to one submirror must complete before the next submirror write is initiated. The serial option is provided in case a submirror becomes unreadable, for example, due to a power failure.
92
Full Resynchronization
When a new submirror is attached (added) to a mirror, all the data from another submirror in the mirror is automatically written to the newly attached submirror. Once the mirror resynchronization is done, the new submirror is readable. A submirror remains attached to a mirror until it is explicitly detached. If the system crashes while a resynchronization is in progress, the resynchronization is restarted when the system nishes rebooting.
Optimized Resynchronization
During a reboot following a system failure, or when a submirror that was offline is brought back online, Solaris Volume Manager performs an optimized mirror resynchronization. The metadisk driver tracks submirror regions and knows which submirror regions might be out-of-sync after a failure. An optimized mirror resynchronization is performed only on the out-of-sync regions. You can specify the order in which mirrors are resynchronized during reboot, and you can omit a mirror resynchronization by setting submirror pass numbers to 0 (zero). (See Pass Number on page 94 for information.)
93
Caution A pass number of 0 (zero) should only be used on mirrors that are mounted as read-only.
Partial Resynchronization
Following a replacement of a slice within a submirror, Solaris Volume Manager performs a partial mirror resynchronization of data. Solaris Volume Manager copies the data from the remaining good slices of another submirror to the replaced slice.
Pass Number
The pass number, a number in the range 09, determines the order in which a particular mirror is resynchronized during a system reboot. The default pass number is 1. Smaller pass numbers are resynchronized rst. If 0 is used, the mirror resynchronization is skipped. A pass number of 0 should be used only for mirrors that are mounted as read-only. Mirrors with the same pass number are resynchronized at the same time.
Unmirroring The Enhanced Storage tool within the Solaris Management Console does not support unmirroring root (/), /opt, /usr, or swap, or any other le system that cannot be unmounted while the system is running. Instead, use the command-line procedure for these le systems. Attaching You can attach a submirror to a mirror without interrupting service. You attach submirrors to mirrors to create two-way, three-way, and four-way mirrors. Detach vs. Offline When you place a submirror offline, you prevent the mirror from reading from and writing to the submirror, but you preserve the submirrors logical association to the mirror. While the submirror is offline, Solaris Volume Manager keeps track of all writes to the mirror and they are written to the submirror when it is brought back online. By performing an optimized resynchronization, Solaris Volume Manager only has to resynchronize data that has changed, not the entire submirror. When you detach a submirror, you sever its
94
logical association to the mirror. Typically, you place a submirror offline to perform maintenance. You detach a submirror to remove it.
Before you create a mirror, create the RAID 0 (stripe or concatenation) volumes that will make up the mirror. Any le system including root (/), swap, and /usr, or any application such as a database, can use a mirror.
Caution When you create a mirror for an existing le system, be sure that the initial submirror contains the existing le system.
When creating a mirror, rst create a one-way mirror, then attach a second submirror. This strategy starts a resynchronization operation and ensures that data is not corrupted. You can create a one-way mirror for a future two-way or multi-way mirror. You can create up to a four-way mirror. However, two-way mirrors usually provide sufficient data redundancy for most applications, and are less expensive in terms of disk drive costs. A three-way mirror enables you to take a submirror offline and perform a backup while maintaining a two-way mirror for continued data redundancy. Use components of identical size when creating submirrors. Using components of different sizes leaves wasted space in the mirror. Adding additional state database replicas before you create a mirror can improve the mirrors performance. As a general rule, add two additional replicas for each mirror you add to the system. Solaris Volume Manager uses these additional replicas to store the dirty region log (DRL), used to provide optimized resynchronization. By providing adequate numbers of replicas to prevent contention or using replicas on the same disks or controllers as the mirror they log, you will improve overall performance.
I I
You can change a mirrors pass number, and its read and write policies. Mirror options can be changed while the mirror is running.
95
96
CHAPTER
10
Task
Description
Instructions
Create a mirror from unused slices Create a mirror from an existing le system Record the path to the alternate boot device for a mirrored root Attach a submirror
Use the Solaris Volume Manager GUI How to Create a RAID 1 or the metainit command to create a Volume From Unused mirror from unused slices. Slices on page 99 Use the Solaris Volume Manager GUI How to Create a RAID 1 or the metainit command to create a Volume From a File mirror from an existing le system. System on page 101 Find the path to the alternative book device and enter it in the boot instructions. How to Record the Path to the Alternate Boot Device on page 106
Use the Solaris Volume Manager GUI How to Attach a or the metattach command to attach Submirror on page 108 a submirror. Use the Solaris Volume Manager GUI or the metadetach command to detach the submirror. How to Detach a Submirror on page 109
Detach a submirror
97
Task
Description
Instructions
Place a submirror online Use the Solaris Volume Manager GUI How to Place a Submirror or take a submirror or the metaonline command to put a Offline and Online offline submirror online. Use the Solaris on page 110 Volume Manager GUI or the metaoffline command to take a submirror offline. Enable a component within a submirror Check mirror status Use the Solaris Volume Manager GUI or the metareplace command to enable a slice in a submirror. Use the Solaris Volume Manager GUI or the metastat command to check the status of RAID 1 volumes. How to Enable a Slice in a Submirror on page 111 How to Check the Status of Mirrors and Submirrors on page 113
Use the Solaris Volume Manager GUI How to Change RAID 1 or the metaparam command to Volume Options change the options for a specic RAID on page 115 1 volume. Use the Solaris Volume Manager GUI or the metattach command to expand the capacity of a mirror. Use the Solaris Volume Manager GUI or the metareplace command to replace a slice in a submirror. Use the Solaris Volume Manager GUI or the metattach command to replace a submirror. Use the Solaris Volume Manager GUI or the metadetach command or the metaclear command to unmirror a le system. Use the Solaris Volume Manager GUI or the metadetach command or the metaclear command to unmirror a le system that cannot be unmounted. How to Expand a RAID 1 Volume on page 116 How to Replace a Slice in a Submirror on page 117 How to Replace a Submirror on page 118 How to Unmirror a File System on page 119
Expand a mirror
Remove a mirror (unmirror) of a le system that cannot be unmounted Use a mirror to perform backups
Use the Solaris Volume Manager GUI How to Use a RAID 1 or the metaonline command and the Volume to Make an Online metaoffline commands to perform Backup on page 123 backups with mirrors.
98
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then choose Action->Create Volume and follow the instructions on screen. For more information, see the online help. Use the following form of the metainit command to create a one-way mirror: metainit {volume-name} [-m ] {submirror-name}
I I I
volume-name is the name of the volume to create. -m species to create a mirror. submirror-name species the name of the component that will be the rst submirror in the mirror.
See the following examples and the metainit(1M) man page for more information. 4. To add the second submirror, use one of the following methods:
I
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then choose the mirror you want to modify. Choose Action->Properties, then the Submirrors tab and follow the instructions on screen to Attach Submirror. For more information, see the online help. Use the following form of the metattach command: metattach {mirror-name} {new-submirror-name}
I I
volume-name is the name of the RAID 1 volume to modify. submirror-name species the name of the component that will be the next submirror in the mirror.
99
See the following examples and the metattach(1M) man page for more information.
This example shows the creation of a two-way mirror, d50. The metainit command creates two submirrors (d51 and d52), which are RAID 0 volumes. The metainit -m command creates the one-way mirror from the d51 RAID 0 volume. The metattach command attaches d52, creating a two-way mirror and causing a resynchronization. (Any data on the attached submirror is overwritten by the other submirror during the resynchronization.) The system veries that the objects are dened.
This example creates a two-way mirror, d50. The metainit command creates two submirrors (d51 and d52), which are RAID 0 volumes. The metainit -m command with both submirrors creates the mirror from the d51 RAID 0 volume and avoids resynchronization. It is assumed that all information on the mirror is considered invalid and will be regenerated (for example, through a newfs operation) before the mirror is used.
100
If you are mirroring root on a x86 system, install the boot information on the alternate boot disk before you create the RAID 0 or RAID 1 devices. See Booting a System (Tasks) in System Administration Guide: Basic Administration.
In this procedure, an existing device is c1t0d0s0. A second device, c1t1d0s0, is available for the second half of the mirror. The submirrors will be d1 and d2, respectively, and the mirror will be d0.
Caution Be sure to create a one-way mirror with the metainit command then attach the additional submirrors with the metattach command. When the metattach command is not used, no resynchronization operations occur. As a result, data could become corrupted when Solaris Volume Manager assumes that both sides of the mirror are identical and can be used interchangably.
1. Check Prerequisites for Creating Solaris Volume Manager Components on page 46 and Background Information for Creating RAID 1 Volumes on page 95. 2. Identify the slice that contains the existing le system to be mirrored (c1t0d0s0 in this example). 3. Create a new RAID 0 volume on the slice from the previous step by using one of the following methods:
101
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then choose Action->Create Volume and follow the instructions on screen. For more information, see the online help. Use the metainit -f raid-0-volume-name 1 1 ctds-of-slice command.
# metainit -f d1 1 1 c1t0d0s0
4. Create a second RAID 0 volume (concatenation) on an unused slice (c1t1d0s0 in this example) to act as the second submirror. The second submirror must be the same size as the original submirror or larger. Use one of the following methods:
I
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then choose Action->Create Volume and follow the instructions on screen. For more information, see the online help. Use the metainit second-raid-0-volume-name 1 1 ctds-of-slice command.
# metainit d2 1 1 c1t1d0s0
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then choose Action->Create Volume and follow the instructions on screen. For more information, see the online help. Use the metainit mirror-name -m raid-0-volume-name command.
# metainit d0 -m d1
If you are mirroring any le system other than the root (/) le system, then edit the /etc/vfstab le so that the le system mount instructions refer to the mirror, not to the block device. For more information about the/etc/vfstab le, seeMounting File Systems in System Administration Guide: Basic Administration. 6. Remount your newly mirrored le system according to one of the following procedures:
I
If you are mirroring your root (/) le system, run the metaroot d0 command, replacing d0 with the name of the mirror you just created, then reboot your system. For more information, see the metaroot(1M) man page. If you are mirroring a le system that can be unmounted, then unmount and remount the le system.
102
If you are mirroring a le system other than root (/) that cannot be unmounted, then reboot your system.
See the metattach(1M) man page for more information. 8. If you mirrored your root le system, record the alternate boot path. See How to Record the Path to the Alternate Boot Device on page 106.
The -f option forces the creation of the rst concatenation, d1, which contains the mounted le system /master on /dev/dsk/c1t0d0s0. The second concatenation, d2, is created from /dev/dsk/c1t1d0s0. (This slice must be the same size or greater than that of d1.) The metainit command with the -m option creates the one-way mirror, d0, from d1. Next, the entry for the le system should be changed in the /etc/vfstab le to reference the mirror. For example, the following line:
/dev/dsk/c1t0d0s0 /dev/rdsk/c1t0d0s0 /var ufs 2 yes -
Finally, the le system is remounted and submirror d2 is attached to the mirror, causing a mirror resynchronization. The system conrms that the RAID 0 and RAID 1 volumes are set up, and that submirror d2 is attached.
d2: Concat/Stripe is setup # metainit d0 -m d1d0: Mirror is setup # metaroot d0 # lockfs -fa # reboot ... # metattach d0 d2 d0: Submirror d2 is attached # ls -l /dev/rdsk/c0t1d0s0 lrwxrwxrwx 1 root root 88 Feb 8 15:51 /dev/rdsk/c1t3d0s0 -> ../../devices/iommu@f,e0000000/vme@f,df010000/SUNW,pn@4d,1080000/ipi3sc@0,0/i d@3,0:a,raw
Do not attach the second submirror before the system is rebooted. You must reboot between running the metaroot command and attaching the second submirror. The -f option forces the creation of the rst RAID 0 volume, d1, which contains the mounted le system root (/) on /dev/dsk/c0t0d0s0. The second concatenation, d2, is created from /dev/dsk/c0t1d0s0. (This slice must be the same size or greater than that of d1.) The metainit command with the -m option creates the one-way mirror d0 using the concatenation that contains root (/). Next, the metaroot command edits the /etc/vfstab and /etc/system les so that the system can be booted with the root le system (/) on a volume. (It is a good idea to run the lockfs -fa command before rebooting.) After a reboot, the submirror d2 is attached to the mirror, causing a mirror resynchronization. (The system conrms that the concatenations and the mirror are set up, and that submirror d2 is attached.) The ls -l command is run on the root raw device to determine the path to the alternate root device in case the system might later need to be booted from it.
The -f option forces the creation of the rst concatenation, d12, which contains the mounted le system /usr on /dev/dsk/c0t3d0s6. The second concatenation, d22, is created from /dev/dsk/c1t0d0s6. (This slice must be the same size or greater
104
than that of d12.) The metainit command with the -m option creates the one-way mirror d2 using the concatenation containing /usr. Next, the /etc/vfstab le must be edited to change the entry for /usr to reference the mirror. For example, the following line:
/dev/dsk/c0t3d0s6 /dev/rdsk/c0t3d0s6 /usr ufs 1 yes -
After a reboot, the second submirror d22 is attached to the mirror, causing a mirror resynchronization. (The system conrms that the concatenation and the mirror are set up, and that submirror d22 is attached.)
The -f option forces the creation of the rst concatenation, d11, which contains the mounted le system swap on /dev/dsk/c0t0d0s1. The second concatenation, d21, is created from /dev/dsk/c1t0d0s1. (This slice must be the same size or greater than that of d11.) The metainit command with the -m option creates the one-way mirror d1 using the concatenation that contains swap. Next, if there is an entry for swap in the /etc/vfstab le, it must be edited to reference the mirror. For example, the following line:
/dev/dsk/c0t0d0s1 - - swap - no -
After a reboot, the second submirror d21 is attached to the mirror, causing a mirror resynchronization. (The system conrms that the concatenations and the mirror are set up, and that submirror d21 is attached.) To save the crash dump when you have mirrored swap, use the dumpadm command to congure the dump device as a volume. For instance, if the swap device is named /dev/md/dsk/d2, use the dumpadm command to set this device as the dump device.
105
Here you would record the string that follows the /devices directory: /sbus@1,f8000000/esp@1,200000/sd@3,0:a. Solaris Volume Manager users who are using a system with OpenBoot Prom can use the OpenBoot nvalias command to dene a backup root device alias for the secondary root (/) mirror. For example:
ok nvalias backup_root /sbus@1,f8000000/esp@1,200000/sd@3,0:a
Then, redene the boot-device alias to reference both the primary and secondary submirrors, in the order in which you want them to be used, and store the conguration.
ok printenv boot-device boot-device = disk net ok setenv boot-device disk backup-root net boot-device = disk backup-root net ok nvstore
In the event of primary root disk failure, the system would automatically boot to the second submirror. Or, if you boot manually, rather than using auto boot, you would only enter:
ok boot backup_root
Here, you would record the string that follows the /devices directory: /eisa/eha@1000,0/cmdk@1,0:a
an unlabeled mirror, that indicates that you unsuccessfully attempted to attach a RAID 0 volume to a mirror. A labeled volume (submirror) is a volume whose rst component starts at cylinder 0, while an unlabeled volumes rst component starts at cylinder 1. To prevent the labeled submirrors label from being corrupted, Solaris Volume Manager does not allow labeled submirrors to be attached to unlabeled mirrors.
1. Identify the component (concatenation or stripe) to be used as a submirror. It must be the same size (or larger) as the existing submirror in the mirror. If you have not yet created a volume to be a submirror, see Creating RAID 0 (Stripe) Volumes on page 78 or Creating RAID 0 (Concatenation) Volumes on page 79. 2. Make sure that you have root privilege and that you have a current backup of all data. 3. Use one of the following methods to attach a submirror.
I
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, choose the mirror, then choose Action->Properties and click the Components tab. Follow the instructions on screen. For more information, see the online help. Use the metattach mirror submirror command.
# metattach mirror submirror
ExampleAttaching a Submirror
# metastat d30 d30: mirror Submirror 0: d60 State: Okay ... # metattach d30 d70 d30: submirror d70 is attached 108 Solaris Volume Manager Administration Guide April 2004
# metastat d30 d30: mirror Submirror 0: d60 State: Okay Submirror 1: d70 State: Resyncing Resync in progress: 41 % done Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 2006130 blocks ...
This example shows the attaching of a submirror, d70, to a one-way mirror, d30, creating a two-way mirror. The mirror d30 initially consists of submirror d60. The submirror d70 is a RAID 0 volume. You verify that the status of the mirror is Okay with the metastat command, then attach the submirror. When the metattach command is run, the new submirror is resynchronized with the existing mirror. When you attach an additional submirror to the mirror, the system displays a message. To verify that the mirror is resynchronizing, use the metastat command.
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, choose the mirror, then choose Action->Properties and click the Components tab. Follow the instructions on screen. For more information, see the online help. Use the metadetach command to detach a submirror from a mirror.
# metadetach mirror submirror
ExampleDetaching a Submirror
# metastat d5: mirror Submirror 0: d50 ... # metadetach d5 d50 d5: submirror d50 is detached
109
In this example, mirror d5 has a submirror, d50, which is detached with the metadetach command. The underlying slices from d50 are going to be reused elsewhere. When you detach a submirror from a mirror, the system displays a conrmation message.
metadetach command. However, the metaoffline command does not sever the logical association between the submirror and the mirror.
1. Make sure that you have root privilege and that you have a current backup of all data. 2. Read Background Information for RAID 1 Volumes on page 94. 3. Use one of the following methods to place a submirror online or offline.
I
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, choose the mirror, then choose Action->Properties and click the Components tab. Follow the instructions on screen. For more information, see the online help. Use the metaoffline command to take offline a submirror.
# metaoffline mirror submirror
In this example, submirror d11 is taken offline from mirror d10. Reads will continue to be made from the other submirror. The mirror will be out of sync as soon as the rst write is made. This inconsistency is corrected when the offlined submirror is brought back online.
110 Solaris Volume Manager Administration Guide April 2004
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, choose the mirror, then choose Action->Properties and click the Components tab. Follow the instructions on screen. For more information, see the online help. Use the metareplace command to enable a failed slice in a submirror.
# metareplace -e mirror failed-slice
The metareplace command automatically starts a resynchronization to synchronize the repaired or replaced slice with the rest of the mirror. See the metareplace(1M) man page for more information.
In this example, the mirror d11 has a submirror that contains slice, c1t4d0s7, which had a soft error. The metareplace command with the -e option enables the failed slice. If a physical disk is defective, you can either replace it with another available disk (and its slices) on the system as documented in How to Replace a Slice in a Submirror on page 117. Alternatively, you can repair/replace the disk, format it, and run the metareplace command with the -e option as shown in this example.
111
Okay Resyncing
The submirror has no errors and is functioning correctly. The submirror is actively being resynchronized. An error has occurred and been corrected, the submirror has just been brought back online, or a new submirror has been added. A slice (or slices) in the submirror has encountered an I/O error or an open error. All reads and writes to and from this slice in the submirror have been discontinued.
Needs Maintenance
Additionally, for each slice in a submirror, the metastat command shows the Device (device name of the slice in the stripe); Start Block on which the slice begins; Dbase to show if the slice contains a state database replica; State of the slice; and Hot Spare to show the slice being used to hot spare a failed slice. The slice state is perhaps the most important information when you are troubleshooting mirror errors. The submirror state only provides general status information, such as Okay or Needs Maintenance. If the submirror reports a Needs Maintenance state, refer to the slice state. You take a different recovery action if the slice is in the Maintenance or Last Erred state. If you only have slices in the Maintenance state, they can be repaired in any order. If you have a slices in the Maintenance state and a slice in the Last Erred state, you must x the slices in the Maintenance state rst then the Last Erred slice. See Overview of Replacing and Enabling Components in RAID 1 and RAID 5 Volumes on page 241. The following table explains the slice states for submirrors and possible actions to take.
TABLE 102 State
Okay
None.
112
(Continued)
Action
Resyncing
The component is actively being resynchronized. An error has occurred and been corrected, the submirror has just been brought back online, or a new submirror has been added. The component has encountered an I/O error or an open error. All reads and writes to and from this component have been discontinued.
Maintenance
Enable or replace the failed component. See How to Enable a Slice in a Submirror on page 111, or How to Replace a Slice in a Submirror on page 117. The metastat command will show an invoke recovery message with the appropriate action to take with the metareplace command. You can also use the metareplace -e command. First, enable or replace components in the Maintenance state. See How to Enable a Slice in a Submirror on page 111, or How to Replace a Slice in a Submirror on page 117. Usually, this error results in some data loss, so validate the mirror after it is xed. For a le system, use the fsck command, then check the data. An application or database must have its own method of validating the device.
Last Erred
The component has encountered an I/O error or an open error. However, the data is not replicated elsewhere due to another slice failure. I/O is still performed on the slice. If I/O errors result, the mirror I/O will fail.
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, choose the mirror, then choose Action->Properties. Follow the instructions on screen. For more information, see the online help. Run the metastat command on a mirror to see the state of each submirror, the pass number, the read option, the write option, and the size of the total number of blocks in mirror. For example, to check the status of the one-way mirror d70, use:
# metastat d70 d70: Mirror Submirror 0: d71 State: Okay Pass: 1 Read option: roundrobin (default) Chapter 10 RAID 1 (Mirror) Volumes (Tasks) 113
Write option: parallel (default) Size: 12593637 blocks d71: Submirror of d70 State: Okay Size: 12593637 blocks Stripe 0: Device c1t3d0s3 Stripe 1: Device c1t3d0s4 Stripe 2: Device c1t3d0s5
Hot Spare
Hot Spare
Hot Spare
See How to Change RAID 1 Volume Options on page 115 to change a mirrors pass number, read option, or write option. See metastat(1M) for more information about checking device status.
Start Block 0
Hot Spare
For each submirror in the mirror, the metastat command shows the state, an invoke line if there is an error, the assigned hot spare pool (if any), size in blocks, and information about each slice in the submirror.
114
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, choose the mirror, then choose Action->Properties. Follow the instructions on screen. For more information, see the online help. Use the metaparam command to display and change a mirrors options. For example, to change a mirror to rst, rather than round-robin, for reading, use the following:
# metaparam -r first mirror
See RAID 1 Volume Options on page 91 for a description of mirror options. Also see the metaparam(1M) man page.
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, choose the mirror, then choose Action->Properties and click the Components tab. Follow the instructions on screen. For more information, see the online help. Use the metattach command to attach additional slices to each submirror. For example, to attach a component to a submirror, use the following:
# metattach submirror component
Each submirror in a mirror must be expanded. See the metattach(1M) man page for more information.
This example shows how to expand a mirrored mounted le system by concatenating two disk drives to the mirrors two submirrors. The mirror is named d8 and contains two submirrors named d9 and d10.
116
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, choose the mirror, then choose Action->Properties and click the Components tab. Follow the instructions on screen. For more information, see the online help. Use the following form of the metareplace command to replace a slice in a submirror: metareplace {mirror-name} {component-name}
I I
mirror-name is the name of the volume to create. component-name species the name of the component that is to be replaced.
See the following examples and the metainit(1M) man page for more information.
# metastat d6 d6: Mirror Submirror 0: d16 State: Okay Submirror 1: d26 State: Needs maintenance ... d26: Submirror of d6 State: Needs maintenance Invoke: metareplace d6 c0t2d0s2 <new device> ... # metareplace d6 c0t2d0s2 c0t2d2s2 d6: device c0t2d0s2 is replaced with c0t2d2s2
The metastat command conrms that mirror d6 has a submirror, d26, with a slice in the Needs maintenance state. The metareplace command replaces the slice as specied in the Invoke line of the metastat output with another available slice on the system. The system conrms that the slice is replaced, and starts resynchronizing the submirror.
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, choose the mirror, then choose Action->Properties and click the Components tab. Follow the instructions on screen. For more information, see the online help. Use the metadetach, metaclear, metatinit, and metattach commands to replace an entire submirror.
State: Okay Submirror 1: d22 State: Needs maintenance ... # metadetach -f d20 d22 d20: submirror d22 is detached # metaclear -f d22 d22: Concat/Stripe is cleared # metainit d22 2 1 c1t0d0s2 1 c1t0d1s2 d22: Concat/Stripe is setup # metattach d20 d22 d20: components are attached
The metastat command conrms that the two-way mirror d20 has a submirror, d22, in the Needs maintenance state. In this case, the entire submirror will be cleared and recreated. The metadetach command detaches the failed submirror from the mirror by using the -f option, which forces the detach to occur. The metaclear command clears the submirror. The metainit command recreates submirror d22, with new slices. The metattach command attaches the rebuilt submirror, and a mirror resynchronization begins automatically. You temporarily lose the capability for data redundancy while the mirror is a one-way mirror.
# umount /home
5. Detach the submirror that will continue to be used for the le system For more information, see the metadetach(1M) man page.
# metadetach d1 d10
6. Clear the mirror and remaining subcomponents. For more information, see the metaclear(1M)
# metaclear -r d1
7. Edit the /etc/vfstab le to use the component detached in Step 5, if necessary. 8. Remount the le system.
In this example, the /opt lesystem is made of a two-way mirror named d4; its submirrors are d2 and d3, made of slices /dev/dsk/c0t0d0s0 and /dev/dsk/c1t0d0s0, respectively. The metastat command veries that at least one submirror is in the Okay state. (A mirror with no submirrors in the Okay state must be repaired rst.) The le system is unmounted then submirror d2 is detached. The metaclear -r command deletes the mirror and the other submirror, d3. Next, the entry for /opt in the /etc/vfstab le is changed to reference the underlying slice. For example, if d4 were the mirror and d2 the submirror, the following line:
/dev/md/dsk/d4 /dev/md/rdsk/d4 /opt ufs 2 yes -
120
By using the submirror name, you can continue to have the le system mounted on a volume. Finally, the /opt le system is remounted. By using d2 instead of d4 in the /etc/vfstab le, you have unmirrored the mirror. Because d2 consists of a single slice, you can mount the le system on the slice name (/dev/dsk/c0t0d0s0) if you do not want the device to support a volume.
In this example, root (/) is a two-way mirror named d0; its submirrors are d10 and d20, which are made of slices /dev/dsk/c0t3d0s0 and /dev/dsk/c1t3d0s0, respectively. The metastat command veries that at least one submirror is in the Okay state. (A mirror with no submirrors in the Okay state must rst be repaired.) Submirror d20 is detached to make d0 a one-way mirror. The metaroot command is then run, using the rootslice from which the system is going to boot. This command
Chapter 10 RAID 1 (Mirror) Volumes (Tasks) 121
edits the /etc/system and /etc/vfstab les to remove information that species the mirroring of root (/). After a reboot, the metaclear -r command deletes the mirror and the other submirror, d10. The last metaclear command clears submirror d20.
ExampleUnmirroring swap
# metastat d1 d1: Mirror Submirror 0: d11 State: Okay Submirror 1: d21 State: Okay ... # metadetach d1 d21 d1: submirror d21 is detached (Edit the /etc/vfstab le to change the entry for swap from metadevice to slice name) # reboot ... # metaclear -r d1 d1: Mirror is cleared d11: Concat/Stripe is cleared # metaclear d21 d21: Concat/stripe is cleared
In this example, swap is made of a two-way mirror named d1; its submirrors are d11 and d21, which are made of slices /dev/dsk/c0t3d0s1 and /dev/dsk/c1t3d0s1, respectively. The metastat command veries that at least one submirror is in the Okay state. (A mirror with no submirrors in the Okay state must rst be repaired.) Submirror d21 is detached to make d1 a one-way mirror. Next, the /etc/vfstab le must be edited to change the entry for swap to reference the slice that is in submirror d21. For example, if d1 was the mirror, and d21 the submirror containing slice /dev/dsk/c0t3d0s1, the following line:
/dev/md/dsk/d1 - - swap - no -
After a reboot, the metaclear -r command deletes the mirror and the other submirror, d11. The nal metaclear command clears submirror d21.
122
If you use this procedure on a two-way mirror, be aware that data redundancy is lost while one submirror is offline for backup. A multi-way mirror does not have this problem. There is some overhead on the system when the reattached submirror is resynchronized after the backup is complete.
Write-locking the le system (UFS only). Do not lock root (/). Flushing all data from cache to disk. Using the metadetach command to take one submirror off of the mirror Unlocking the le system Using the fsck command to check the le system on the detached submirror Backing up the data on the detached submirror Using the metattach command to place the detached submirror back in the mirror
Note If you use these procedures regularly, put them into a script for ease of use.
123
Tip The safer approach to this process is to attach a third or fourth submirror to the mirror, allow it to resync, and use it for the backup. This technique ensures that data redundancy is maintained at all times.
1. Run the metastat command to make sure the mirror is in the Okay state. A mirror that is in the Maintenance state should be repaired rst. 2. Flush data and UFS logging data from cache to disk and write-lock the le system.
# /usr/sbin/lockfs -w mount point
Only a UFS volume needs to be write-locked. If the volume is set up as a raw device for database management software or some other application, running lockfs is not necessary. (You might, however, want to run the appropriate vendor-supplied utility to ush any buffers and lock access.)
Caution Write-locking root (/) causes the system to hang, so it should never be performed. If you are backing up your root le system, skip this step.
In this command: mirror submirror Is the volume name of the mirror. Is the volume name of the submirror (volume) being detached.
Reads will continue to be made from the other submirror. The mirror will be out of sync as soon as the rst write is made. This inconsistency is corrected when the detached submirror is reattached in Step 7. 4. Unlock the le system and allow writes to continue.
# /usr/sbin/lockfs -u mount-point
You might need to perform necessary unlocking procedures based on vendor-dependent utilities used in Step 2 above. 5. Use the fsck command to check the le system on the detached submirror to ensure a clean backup.
# fsck /dev/md/rdsk/name
6. Perform a backup of the offlined submirror. Use the ufsdump command or your usual backup utility.
124 Solaris Volume Manager Administration Guide April 2004
Note To ensure a proper backup, use the raw volume, for example, /dev/md/rdsk/d4. Using rdsk allows greater than 2 Gbyte access.
Solaris Volume Manager automatically begins resynchronizing the submirror with the mirror.
125
126
CHAPTER
11
Overview of Soft Partitions on page 127 Conguration Guidelines for Soft Partitions on page 128
127
You use soft partitions to divide a disk slice or logical volume into as many partitions as needed. You must provide a name for each division or soft partition, just like you do for other storage volumes, such as stripes or mirrors. A soft partition, once named, can be accessed by applications, including le systems, as long as the soft partition is not included in another volume. Once included in a volume, the soft partition should no longer be directly accessed. Soft partitions can be placed directly above a disk slice, or on top of a mirror, stripe or RAID 5 volume. A soft partition may not be both above and below other volumes. For example, a soft partition built on a stripe with a mirror built on the soft partition is not allowed. A soft partition appears to le systems and other applications to be a single contiguous logical volume. However, the soft partition actually comprises a series of extents that could be located at arbitrary locations on the underlying media. In addition to the soft partitions, extent headers (also called system recovery data areas) on disk record information about the soft partitions to facilitate recovery in the event of a catastrophic system failure.
While it is technically possible to manually place extents of soft partitions at arbitrary locations on disk (as you can see in the output of metastat -p, described in Viewing the Solaris Volume Manager Conguration on page 228), you should allow the system to place them automatically. Although you can build soft partitions on any slice, creating a single slice that occupies the entire disk and then creating soft partitions on that slice is the most efficient way to use soft partitions at the disk level. Because the maximum size of a soft partition is limited to the size of the slice or logical volume on which it is built, you should build a volume on top of your disk slices, then build soft partitions on top of the volume. This strategy allows you to add components to the volume later, then expand the soft partitions as needed. For maximum exibility and high availability, build RAID 1 (mirror) or RAID 5 volumes on disk slices, then create soft partitions on the mirror or RAID 5 volume.
128
ScenarioSoft Partitions
Soft partitions provide tools with which to subdivide larger storage spaces into more managable spaces. For example, in other scenarios (ScenarioRAID 1 Volumes (Mirrors) on page 96 or ScenarioRAID 5 Volumes on page 142), large storage aggregations provided redundant storage of many Gigabytes. However, many possible scenarios would not require so much spaceat least at rst. Soft partitions allow you to subdivide that storage space into more manageable sections. Each of those sections can have a complete le system. For example, you could create 1000 soft partitions on top of a RAID 1 or RAID 5 volume so that each of your users can have a home directory on a separate le system. If a user needs more space, simply expand the soft partition.
129
130
CHAPTER
12
Task
Description
Instructions
Use the Solaris Volume Manager GUI or the metainit command to create soft partitions. Use the Solaris Volume Manager GUI or the metastat command to check the status of soft partitions. Use the Solaris Volume Manager GUI or the metattach command to expand soft partitions. Use the Solaris Volume Manager GUI or the metaclear command to remove soft partitions.
How to Create a Soft Partition on page 132 How to Check the Status of a Soft Partition on page 133 How to Expand a Soft Partition on page 134 How to Remove a Soft Partition on page 135
131
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node. Choose Action->Create Volume, then follow the instructions in the wizard. For more information, see the online help. To create a soft partition, use the following form of the metainit command:
metainit [-s set] soft-partition -p [-e] component size
-s is used to specify which set is being used. If -s isnt specied, the local (default) disk set is used. -e is used to specify that the entire disk should be reformatted. The format provides a slice 0, taking most of the disk, and a slice 7 of a minimum of 4 Mbytes in size to contain a state database replica. soft-partition is the name of the soft partition. The name is of the form dnnn, where nnn is a number in the range of 0 to 8192. component is the disk, slice, or (logical) volume from which to create the soft partition. All existing data on the component is destroyed because the soft partition headers are written at the beginning of the component. size is the size of the soft partition, and is specied as a number followed by one of the following:
I I I I
See the following examples and the metainit(1M) man page for more information.
132
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node. Choose the soft partition that you want to monitor, then choose Action->Properties, then follow the instructions on screen. For more information, see the online help. To view the existing conguration, use the following format of the metastat command:
metastat soft-partition
133
Submirror 0: d10 State: OKAY Read option: roundrobin (default) Write option: parallel (default) Size: 426742857 blocks d10: Submirror of d100 State: OKAY Hot spare pool: hsp002 Size: 426742857 blocks Stripe 0: (interlace: 32 blocks) Device Start Block c3t3d0s0 0
Dbase State No
1. Read the Conguration Guidelines for Soft Partitions on page 128. 2. Use one of the following methods to expand a soft partition:
I
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node. Choose the soft partition that you want to expand, then choose Action->Properties, then follow the instructions on screen. For more information, see the online help. To add space to a soft partition, use the following form of the metattach command:
metattach [-s disk-set] soft-partition size
disk-set is the name of the disk set in which the soft partition exists. soft-partition is the name of an existing soft partition. size is the amount of space to add.
134
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node. Choose the soft partition that you want to expand, then choose Action->Properties, then follow the instructions on screen. For more information, see the online help. To delete a soft partition, use one of the following forms of the metaclear command:
metaclear [-s disk-set] component metaclear [-s disk-set] -r soft-partition metaclear [-s disk-set] -p component
where:
I I I
disk-set is the disk set in which the soft partition exists. soft-partition is the soft partition to delete. r species to recursively delete logical volumes, but not volumes on which others depend. p species to purge all soft partitions on the specied component, except those soft partitions that are open. component is the component from which to clear all of the soft partitions.
135
136
CHAPTER
13
Overview of RAID 5 Volumes on page 137 Background Information for Creating RAID 5 Volumes on page 140 Overview of Replacing and Enabling Slices in RAID 5 Volumes on page 142
ExampleRAID 5 Volume
Figure 131 shows a RAID 5 volume, d40. The rst three data chunks are written to Disks A through C. The next chunk that is written is a parity chunk, written to Drive D, which consists of an exclusive OR of the rst three chunks of data. This pattern of writing data and parity chunks results in both data and parity being spread across all disks in the RAID 5 volume. Each drive can be read independently. The parity protects against a single disk failure. If each disk in this example were 2 Gbytes, the total capacity of d40 would be 6 Gbytes. (One drives worth of space is allocated to parity.)
interlace 1 Component A interlace 4 interlace 7 P(10-12) interlace 2 Component B interlace 5 P(7-9) interlace 10 Solaris Volume Manager interlace 3 Component C P(4-6) interlace 8 interlace 11 P(1-3) Component D interlace 6 interlace 9 interlace 12
RAID 5 Volume Example
RAID 5 Volume interlace 1 interlace 2 interlace 3 interlace 4 interlace 5 interlace 6 interlace 7 interlace 8 interlace 9 interlace 10 interlace 11 interlace 12
FIGURE 131
138
FIGURE 132
139
The parity areas are allocated when the initial RAID 5 volume is created. One components worth of space is allocated to parity, although the actual parity blocks are distributed across all of the original components to distribute I/O. When you concatenate additional components to the RAID, the additional space is devoted entirely to data. No new parity blocks are allocated. The data on the concatenated components is, however, included in the parity calculations, so it is protected against single device failures. Concatenated RAID 5 volumes are not suited for long-term use. Use a concatenated RAID 5 volume until it is possible to recongure a larger RAID 5 volume and copy the data to the larger volume.
Note When you add a new component to a RAID 5 volume, Solaris Volume Manager zeros all the blocks in that component. This process ensures that the parity will protect the new data. As data is written to the additional space, Solaris Volume Manager includes it in the parity calculations.
A RAID 5 volume must consist of at least three components. The more components a RAID 5 volume contains, however, the longer read and write operations take when a component fails. RAID 5 volumes cannot be striped, concatenated, or mirrored. Do not create a RAID 5 volume from a component that contains an existing le system. Doing so will erase the data during the RAID 5 initialization process. When you create a RAID 5 volume, you can dene the interlace value. If not specied, the interlace value is 16 Kbytes. This value is reasonable for most applications. A RAID 5 volume (with no hot spares) can only handle a single component failure. When you create RAID 5 volumes, use components across separate controllers, because controllers and associated cables tend to fail more often than disks.
I I
I I
140
Use components of the same size. Creating a RAID 5 volume with components of different sizes results in unused disk space.
Because of the complexity of parity calculations, volumes with greater than about 20 percent writes should probably not be RAID 5 volumes. If data redundancy on a write-heavy volume is needed, consider mirroring. If the different components in the RAID 5 volume reside on different controllers and the accesses to the volume are primarily large sequential accesses, then setting the interlace value to 32 Kbytes might improve performance. You can expand a RAID 5 volume by concatenating additional components to the volume. Concatenating a new component to an existing RAID 5 decreases the overall performance of the volume because the data on concatenations is sequential. Data is not striped across all components. The original components of the volume have data and parity striped across all components. This striping is lost for the concatenated component, although the data is still recoverable from errors because the parity is used during the component I/O. The resulting RAID 5 volume continues to handle a single component failure. Concatenated components also differ in the sense that they do not have parity striped on any of the regions. Thus, the entire contents of the component are available for data. Any performance enhancements for large or sequential writes are lost when components are concatenated.
You can create a RAID 5 volume without having to zero out the data blocks. To do so, do one of the following:
I
Use the metainit command with the -k option. The -k option recreates the RAID 5 volume without initializing it, and sets the disk blocks to the OK state. This option is potentially dangerous, as any errors that exist on disk blocks within the volume will cause unpredictable behavior from Solaris Volume Manager, including the possibility of fabricated data. Initialize the device and restore data from tape. See the metainit(1M) man page for more information.
141
ScenarioRAID 5 Volumes
RAID 5 volumes allow you to have redundant storage without the overhead of RAID 1 volumes, which require two times the total storage space to provide data redundancy. By setting up a RAID 5 volume, you can provide redundant storage of greater capacity than you could achieve with RAID 1 on the same set of disk components, and, with the help of hot spares (see Chapter 15 and specically How Hot Spares Work on page 154), nearly the same level of safety. The drawbacks are increased write time and markedly impaired performance in the event of a component failure, but those tradeoffs might be insignicant for many situations. The following example, drawing on the sample system explained in Chapter 4, describes how RAID 5 volumes can provide extra storage capacity. Other scenarios for RAID 0 and RAID 1 volumes used 6 slices (c1t1d0, c1t2d0, c1t3d0, c2t1d0, c2t2d0, c2t3d0) on six disks, spread over two controllers, to provide 27 Gbytes of redundant storage. By using the same slices in a RAID 5 conguration, 45 Gbytes of storage is available, and the conguration can withstand a single component failure without data loss or access interruption. By adding hot spares to the conguration, the RAID 5 volume can withstand additional component failures. The most signicant drawback to this approach is that a controller failure would result in data loss to this RAID 5 volume, while it would not with the RAID 1 volume described in ScenarioRAID 1 Volumes (Mirrors) on page 96.
142
CHAPTER
14
Task
Description
Instructions
Use the Solaris Volume Manager GUI or the metainit command to create RAID 5 volumes. Use the Solaris Volume Manager GUI or the metastat command to check the status of RAID 5 volumes. Use the Solaris Volume Manager GUI or the metattach command to expand RAID 5 volumes. Use the Solaris Volume Manager GUI or the metareplace command to enable slices in RAID 5 volumes. Use the Solaris Volume Manager GUI or the metareplace command to enable slices in RAID 5 volumes.
How to Create a RAID 5 Volume on page 144 How to Check the Status of a RAID 5 Volume on page 145 How to Expand a RAID 5 Volume on page 148 How to Enable a Component in a RAID 5 Volume on page 149 How to Replace a Component in a RAID 5 Volume on page 150
Check the status of RAID 5 volumes Expand a RAID 5 volume Enable a slice in a RAID 5 volume Replace a slice in a RAID 5 volume
143
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then choose Action->Create Volume and follow the steps in the wizard. For more information, see the online help. Use the following form of the metainit command:
metainit name -r component component component
I I I
name is the name for the volume to create. r species to create a RAID 5 volume. component species a slice or soft partition to include in the RAID 5 volume.
To specify an interlace value, add the -i interlace-value option. For more information, see the metainit(1M) man page.
In this example, the RAID 5 volume d45 is created with the -r option from three slices. Because no interlace value is specied, d45 uses the default of 16 Kbytes. The system veries that the RAID 5 volume has been set up, and begins initializing the volume. You must wait for the initialization to nish before you can use the RAID 5 volume.
144
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node and view the status of the volumes. Choose a volume, then choose Action->Properties to see more detailed information. For more information, see the online help. Use the metastat command. For each slice in the RAID 5 volume, the metastat command shows the following:
I I I I I
Device (device name of the slice in the stripe) Start Block on which the slice begins Dbase to show if the slice contains a state database replica State of the slice Hot Spare to show the slice being used to hot spare a failed slice
Dbase No No No
Hot Spare
The metastat command output identies the volume as a RAID 5 volume. For each slice in the RAID 5 volume, it shows the name of the slice in the stripe, the block on which the slice begins, an indicator that none of these slices contain a state database replica, that all the slices are okay, and that none of the slices are hot spare replacements for a failed slice.
RAID 5 States
Meaning
Initializing
Slices are in the process of having all disk blocks zeroed. This process is necessary due to the nature of RAID 5 volumes with respect to data and parity interlace striping. Once the state changes to Okay, the initialization process is complete and you are able to open the device. Up to this point, applications receive error messages.
Okay Maintenance
The device is ready for use and is currently free from errors. A slice has been marked as failed due to I/O or open errors that were encountered during a read or write operation.
The slice state is perhaps the most important information when you are troubleshooting RAID 5 volume errors. The RAID 5 state only provides general status information, such as Okay or Needs Maintenance. If the RAID 5 reports a Needs Maintenance state, refer to the slice state. You take a different recovery action if the slice is in the Maintenance or Last Erred state. If you only have a slice in the Maintenance state, it can be repaired without loss of data. If you have a slice in the Maintenance state and a slice in the Last Erred state, data has probably been corrupted. You must x the slice in the Maintenance state rst then the Last Erred slice. See Overview of Replacing and Enabling Components in RAID 1 and RAID 5 Volumes on page 241. The following table explains the slice states for a RAID 5 volume and possible actions to take.
146
Initializing
Slices are in the process of having all disk blocks zeroed. This process is necessary due to the nature of RAID 5 volumes with respect to data and parity interlace striping. The device is ready for use and is currently free from errors. The slice is actively being resynchronized. An error has occurred and been corrected, a slice has been enabled, or a slice has been added. A single slice has been marked as failed due to I/O or open errors that were encountered during a read or write operation.
Normally none. If an I/O error occurs during this process, the device goes into the Maintenance state. If the initialization fails, the volume is in the Initialization Failed state, and the slice is in the Maintenance state. If this happens, clear the volume and re-create it. None. Slices can be added or replaced, if necessary. If desired, monitor the RAID 5 volume status until the resynchronization is done.
Okay
Resyncing
Maintenance
Enable or replace the failed slice. See How to Enable a Component in a RAID 5 Volume on page 149, or How to Replace a Component in a RAID 5 Volume on page 150. The metastat command will show an invoke recovery message with the appropriate action to take with the metareplace command. Enable or replace the failed slices. See How to Enable a Component in a RAID 5 Volume on page 149, or How to Replace a Component in a RAID 5 Volume on page 150. The metastat command will show an invoke recovery message with the appropriate action to take with the metareplace command, which must be run with the -f ag. This state indicates that data might be fabricated due to multiple failed slices.
Multiple slices have encountered errors. The state of the failed slices is either Maintenance or Last Erred. In this state, no I/O is attempted on the slice that is in the Maintenance state, but I/O is attempted to the slice marked Last Erred with the outcome being the overall status of the I/O request.
147
1. Make sure that you have a current backup of all data and that you have root access. 2. Read Background Information for Creating RAID 5 Volumes on page 140. 3. To attach additional components to a RAID 5 volume, use one of the following methods:
I
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then open the RAID 5 volume. Choose the Components pane, then choose Attach Component and follow the instructions. For more information, see the online help. Use the following form of the metattach command:
metattach volume-name name-of-component-to-add
I I
volume-name is the name for the volume to expand. name-of-component-to-add species the name of the component to attach to the RAID 5 volume.
This example shows the addition of slice c2t1d0s2 to an existing RAID 5 volume named d2.
148
1. Make sure that you have a current backup of all data and that you have root access. 2. To enable a failed component in a RAID 5 volume, use one of the following methods:
I
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then open the RAID 5 volume. Choose the Components pane, then choose the failed component. Click Enable Component and follow the instructions. For more information, see the online help. Use the following form of the metareplace command:
metareplace -e volume-name component-name
I
-e species to replace the failed component with a component at the same location (perhaps after physically replacing a disk). volume-name is the name of the volume with a failed component. component-name species the name of the component to replace.
I I
metareplace automatically starts resynchronizing the new component with the rest of the RAID 5 volume.
In this example, the RAID 5 volume d20 has a slice, c2t0d0s2, which had a soft error. The metareplace command with the -e option enables the slice.
149
You can use the metareplace command on non-failed devices to change a disk slice or other component. This procedure can be useful for tuning the performance of RAID 5 volumes. 1. Make sure that you have a current backup of all data and that you have root access. 2. Use one of the following methods to determine which slice of the RAID 5 volume needs to be replaced:
I
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then open the RAID 5 volume. Choose the Components pane, then view the status of the individual components. For more information, see the online help. Use the metastat command.
Look for the keyword Maintenance to identify the failed slice. 3. Use one of the following methods to replace the failed slice with another slice:
I
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then open the RAID 5 volume. Choose the Components pane, then choose the failed component. Click Replace Component and follow the instructions. For more information, see the online help. Use the following form of the metareplace command:
metareplace volume-name failed-component new-component
I I I
volume-name is the name of the volume with a failed component. failed-component species the name of the component to replace. new-component species the name of the component to add to the volume in place of the failed component.
See the metareplace(1M) man page for more information. 4. To verify the status of the replacement slice, use one of the methods described in Step 2. The state of the replaced slice should be Resyncing or Okay.
150
<new device>
Dbase No No No No No No
Hot Spare
# metareplace d1 c0t14d0s6 c0t4d0s6 d1: device c0t14d0s6 is replaced with c0t4d0s6 # metastat d1 d1: RAID State: Resyncing Resync in progress: 98% done Interlace: 32 blocks Size: 8087040 blocks Original device: Size: 8087520 blocks Device Start Block Dbase State c0t9d0s6 330 No Okay c0t13d0s6 330 No Okay c0t10d0s6 330 No Okay c0t11d0s6 330 No Okay c0t12d0s6 330 No Okay c0t4d0s6 330 No Resyncing
Hot Spare
In this example, the metastat command displays the action to take to recover from the failed slice in the d1 RAID 5 volume. After locating an available slice, the metareplace command is run, specifying the failed slice rst, then the replacement slice. (If no other slices are available, run the metareplace command with the -e option to attempt to recover from possible soft errors by resynchronizing the failed device.) If multiple errors exist, the slice in the Maintenance state must rst be replaced or enabled. Then the slice in the Last Erred state can be repaired. After the metareplace command, the metastat command monitors the progress of the resynchronization. During the replacement, the state of the volume and the new slice will is Resyncing. You can continue to use the volume while it is in this state.
151
152
CHAPTER
15
Overview of Hot Spares and Hot Spare Pools on page 153 How Hot Spares Work on page 154 Administering Hot Spare Pools on page 156
A hot spare cannot be used to hold data or state database replicas while it is idle. A hot spare must remain ready for immediate use in the event of a slice failure in the volume with which it is associated. To use hot spares, you must invest in additional disks beyond those disks that the system actually requires to function.
153
Hot Spares
A hot spare is a slice (not a volume) that is functional and available, but not in use. A hot spare is reserved, meaning that it stands ready to substitute for a failed slice in a submirror or RAID 5 volume. Hot spares provide protection from hardware failure because slices from RAID 1 or RAID 5 volumes are automatically replaced and resynchronized when they fail. The hot spare can be used temporarily until a failed submirror or RAID 5 volume slice can be either xed or replaced. You create hot spares within hot spare pools. Individual hot spares can be included in one or more hot spare pools. For example, you might have two submirrors and two hot spares. The hot spares can be arranged as two hot spare pools, with each pool having the two hot spares in a different order of preference. This strategy enables you to specify which hot spare is used rst, and it improves availability by having more hot spares available. A submirror or RAID 5 volume can use only a hot spare whose size is equal to or greater than the size of the failed slice in the submirror or RAID 5 volume. If, for example, you have a submirror made of 1 Gbyte drives, a hot spare for the submirror must be 1 Gbyte or greater.
When the slice experiences an I/O error, the failed slice is placed in the Broken state. To x this condition, rst repair or replace the failed slice. Then, bring the slice back to the Available state by using the Enhanced Storage tool within the Solaris Management Console or the metahs -e command. When a submirror or RAID 5 volume is using a hot spare in place of an failed slice and that failed slice is enabled or replaced, the hot spare is then marked Available in the hot spare pool, and is again ready for use.
154
When I/O errors occur, Solaris Volume Manager checks the hot spare pool for the rst available hot spare whose size is equal to or greater than the size of the slice that is being replaced. If found, Solaris Volume Manager changes the hot spares status to In-Use and automatically resynchronizes the data. In the case of a mirror, the hot spare is resynchronized with data from a good submirror. In the case of a RAID 5 volume, the hot spare is resynchronized with the other slices in the volume. If a slice of adequate size is not found in the list of hot spares, the submirror or RAID 5 volume that failed goes into a failed state and the hot spares remain unused. In the case of the submirror, the submirror no longer replicates the data completely. In the case of the RAID 5 volume, data redundancy is no longer available.
155
Slice 1
Slice 2
FIGURE 151
156
ScenarioHot Spares
Hot spares provide extra protection for redundant volumes (RAID 1 and RAID 5) to help guard against data loss. By associating hot spares with the underlying slices that comprise your RAID 0 submirrors or RAID 5 conguration, you can have the system automatically replace failed slices with good slices from the hot spare pool. Those slices that were swapped into use are updated with the information they should have, then can continue to function just like the original. You can replace them at your convenience.
157
158
CHAPTER
16
Task
Description
Instructions
Use the Solaris Volume Manager GUI How to Create a Hot or the metainit command to create a Spare Pool on page 160 hot spare pool. How to Add Additional Slices to a Hot Spare Pool on page 161 How to Associate a Hot Spare Pool With a Volume on page 162 How to Change the Associated Hot Spare Pool on page 164
Add slices to a hot spare Use the Solaris Volume Manager GUI pool or the metahs command to add slices to a hot spare pool. Associate a hot spare pool with a volume Use the Solaris Volume Manager GUI or the metaparam command to associate a hot spare pool with a volume.
Change which hot spare Use the Solaris Volume Manager GUI pool is associated with a or the metaparam command to volume change which hot spare pool is associated with a volume.
159
Task
Description
Instructions
Check the status of hot spares and hot spare pools Replace a hot spare in a hot spare pool Delete a hot spare from a hot spare pool Enable a hot spare
Use the Solaris Volume Manager GUI, or the metastat or metahs -i commands to check the status of a hot spare or hot spare pool. Use the Solaris Volume Manager GUI or the metahs command to replace a hot spare in a hot spare pool. Use the Solaris Volume Manager GUI or the metahs command to delete a hot spare from a hot spare pool. Use the Solaris Volume Manager GUI or the metahs command to enable a hot spare in a hot spare pool.
How to Check the Status of Hot Spares and Hot Spare Pools on page 165 How to Replace a Hot Spare in a Hot Spare Pool on page 166 How to Delete a Hot Spare From a Hot Spare Pool on page 168 How to Enable a Hot Spare on page 169
Caution Solaris Volume Manager will not warn you if you create a hot spare that is not large enough. If the hot spare is not equal to, or larger than, the volume to which it is attached, the hot spare will not work.
1. Check Prerequisites for Creating Solaris Volume Manager Components on page 46.
160 Solaris Volume Manager Administration Guide April 2004
From the Enhanced Storage tool within the Solaris Management Console, open the Hot Spare Pools node, then choose Action->Create Hot Spare Pool. For more information, see the online help. Use the following form of the metainit command:
metainit hot-spare-pool-name ctds-for-slice
where ctds-for-slice is repeated for each slice in the hot spare pool. See the metainit(1M) man page for more information.
In this example, the hot spare pool hsp001 contains two disks as the hot spares. The system conrms that the hot spare pool has been set up.
From the Enhanced Storage tool within the Solaris Management Console, open the Hot Spare Pools node, then choose the hot spare pool you want to change. Choose Action->Properties, then choose the Components panel. For more information, see the online help. Use the following form of the metahs command:
metahs -a hot-spare-pool-name slice-to-add
Use -a for hot-spare-pool-name to add the slice to the specied hot spare pool. Use -all for hot-spare-pool-name to add the slice to all hot spare pools. See the metahs(1M) man page for more information.
Chapter 16 Hot Spare Pools (Tasks) 161
Note You can add a hot spare to one or more hot spare pools. When you add a hot
spare to a hot spare pool, it is added to the end of the list of slices in the hot spare pool.
In this example, the -a option adds the slice /dev/dsk/c3t0d0s2 to hot spare pool hsp001. The system veries that the slice has been added to the hot spare pool.
In this example, the -a and -all options add the slice /dev/dsk/c3t0d0s2 to all hot spare pools congured on the system. The system veries that the slice has been added to all hot spare pools.
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes and choose a volume. Choose Action->Properties, then choose the Hot Spare Pool panel and Attach HSP. For more information, see the online help.
162
-h hot-spare-pool component
Species to modify the hot spare pool named. Is the name of the hot spare pool. Is the name of the submirror or RAID 5 volume to which the hot spare pool is being attached.
The -h option associates a hot spare pool, hsp100, with two submirrors, d10 and d11, of mirror, d0. The metastat command shows that the hot spare pool is associated with the submirrors.
163
The -h option associates a hot spare pool named hsp001 with a RAID 5 volume named d10. The metastat command shows that the hot spare pool is associated with the RAID 5 volume.
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node and choose the volume. Choose Action->Properties, then choose the Hot Spare Pool panel. Detach the unwanted hot spare pool and detach the new hot spare pool by following the instructions. For more information, see the online help. Use the following form of the metaparam command:
metaparam -h hot-spare-pool-name RAID5-volume-or-submirror-name
-h hot-spare-pool component
Species to modify the hot spare pool named. Is the name of the new hot spare pool, or the special keyword none to remove hot spare pool associations. Is the name of the submirror or RAID 5 volume to which the hot spare pool is being attached.
164
In this example, the hot spare pool hsp001 is initially associated with a RAID 5 volume named d4. The hot spare pool association is changed to hsp002. The metastat command shows the hot spare pool association before and after this change.
In this example, the hot spare pool hsp001 is initially associated with a RAID 5 volume named d4. The hot spare pool association is changed to none, which indicates that no hot spare pool should be associated with this device. The metastat command shows the hot spare pool association before and after this change.
How to Check the Status of Hot Spares and Hot Spare Pools
G To view the status of a hot spare pool and its hot spares, use one of the following
methods:
I
From the Enhanced Storage tool within the Solaris Management Console, open the Hot Spare Pools node and select a hot spare pool. Choose Action->Properties to get detailed status information. For more information, see the online help. Run the following form of the metastat command:
metastat hot-spare-pool-name
165
Available
16800 blocks
The metahs command can also be used to check the status of hot spare pool.
Available
The hot spares are running and ready to accept data, but are not currently being written to or read from. This hot spare pool includes slices that have been used to replace failed components in a redundant volume. There is a problem with a hot spare or hot spare pool, but there is no immediate danger of losing data. This status is also displayed if all the hot spares are in use or if any hot spares are broken.
None.
In-use
Diagnose how the hot spares are being used. Then, repair the slice in the volume for which the hot spare is being used. Diagnose how the hot spares are being used or why they are broken. You can add more hot spares to the hot spare pool, if desired.
Broken
From the Enhanced Storage tool within the Solaris Management Console, open the Hot Spare Pools node and select a hot spare pool. Choose Action->Properties, then choose the Hot Spares panel and follow the instructions. For more information, see the online help. Use the following form of the metastat command:
metastat hot-spare-pool-name
See the metastat(1M) man page. 2. To replace the hot spare, use one of the following methods:
166 Solaris Volume Manager Administration Guide April 2004
From the Enhanced Storage tool within the Solaris Management Console, open the Hot Spare Pools node and select a hot spare pool. Choose Action->Properties, then choose the Hot Spares panel and follow the instructions. For more information, see the online help. Use the following form of the metahs command:
metahs -r hot-spare-pool-name current-hot-spare replacement-hot-spare
Species to replace disks in the hot spare pool named. Is the name of the hot spare pool, or the special keyword all to change all hot spare pool associations. Is the name of the current hot spare that will be replaced. Is the name of the slice to take the place of the current hot spare in the named pools.
In this example, the metastat command makes sure that the hot spare is not in use. The metahs -r command replaces hot spare /dev/dsk/c0t2d0s2 with /dev/dsk/c3t1d0s2 in the hot spare pool hsp003.
In this example, the keyword all replaces hot spare /dev/dsk/c1t0d0s2 with /dev/dsk/c3t1d0s2 in all its associated hot spare pools.
167
From the Enhanced Storage tool within the Solaris Management Console, open the Hot Spare Pools node and select a hot spare pool. Choose Action->Properties, then choose the Hot Spares panel and follow the instructions. For more information, see the online help. Use the following form of the metastat command:
metastat hot-spare-pool-name
See the metastat(1M) man page. 2. To delete the hot spare, use one of the following methods:
I
From the Enhanced Storage tool within the Solaris Management Console, open the Hot Spare Pools node and select a hot spare pool. Choose Action->Properties, then choose the Hot Spares panel and follow the instructions. For more information, see the online help. Use the following form of the metahs command:
metahs -d hot-spare-pool-name current-hot-spare
-d hot-spare-pool current-hot-spare
Species to delete a hot spare from the hot spare pool named. Is the name of the hot spare pool, or the special keyword all to delete from all hot spare pools. Is the name of the current hot spare that will be deleted.
In this example, the metastat command makes sure that the hot spare is not in use. The metahs -d command deletes hot spare /dev/dsk/c0t2d0s2 in the hot spare pool hsp003.
168
From the Enhanced Storage tool within the Solaris Management Console, open the Hot Spare Pools node and select a hot spare pool. Choose Action->Properties, then the Hot Spares panel and follow the instructions. For more information, see the online help. Use the following form of the metahs command:
metahs -e hot-spare-slice
-e hot-spare-slice
In this example, the command places the hot spare /dev/dsk/c0t0d0s2 in the Available state after it has been repaired. It is unnecessary to specify a hot spare pool.
169
170
CHAPTER
17
About File System Logging on page 171 Background Information for Transactional Volumes on page 175 ScenarioTransactional Volumes on page 178
Note Transactional volumes are scheduled to be removed from the Solaris operating environment in an upcoming Solaris release. UFS logging, available since the Solaris 8 release, provides the same capabilities but superior performance, as well as lower system administration requirements and overhead. These benets provide a clear choice for optimal performance and capabilities.
171
At reboot, the system discards incomplete transactions, but applies the transactions for completed operations. The le system remains consistent because only completed transactions are ever applied. Because the le system is never inconsistent, it does not need checking by the fsck command. A system crash can interrupt current system calls and introduce inconsistencies into an unlogged UFS. If you mount a UFS without running the fsck command, these inconsistencies can cause panics or corrupt data. Checking large le systems takes a long time, because it requires reading and verifying the le system data. With UFS logging, UFS le systems do not have to be checked at boot time because the changes from unnished system calls are discarded.
Transactional volumes can write log information onto physically separate devices, while UFS logging combines logs and le systems on the same volume. UFS logging offers superior performance to transactional volumes in all cases. UFS logging allows logging of all UFS le systems, including root (/), while transactional volumes cannot log the root (/) le system.
I I
Note Transactional volumes are scheduled to be removed from the Solaris operating environment in an upcoming Solaris release. UFS logging, available since the Solaris 8 release, provides the same capabilities but superior performance, as well as lower system administration requirements and overhead. These benets provide a clear choice for optimal performance and capabilities.
To enable UFS logging, use the mount_ufs -logging option on the le system, or add logging to the mount options for the le system in the /etc/vfstab le. For more information about mounting le systems with UFS logging enabled, see Mounting and Unmounting File Systems (Tasks) in System Administration Guide: Basic Administration and the mount_ufs(1M) man page. To learn more about using transactional volumes, continue reading this document.
Note If you are not currently logging UFS le systems but want to use this feature, choose UFS logging, rather than transactional volumes.
172
Transactional Volumes
A transactional volume is a volume that is used to manage le system logging, which is essentially the same as UFS logging. Both methods record UFS updates in a log before the updates are applied to the le system. A transactional volume consists of two devices:
I
The master device is a slice or volume that contains the le system that is being logged. The log device is a slice or volume that contains the log and can be shared by several le systems. The log is a sequence of records, each of which describes a change to a le system.
Caution A log device or a master device can be a physical slice or a volume. However, to improve reliability and availability, use RAID 1 volumes (mirrors) for log devices. A device error on a physical log device could cause data loss. You can also use RAID 1 or RAID 5 volumes as master devices.
Logging begins automatically when the transactional volume is mounted, provided the transactional volume has a log device. The master device can contain an existing UFS le system (because creating a transactional volume does not alter the master device). Or, you can create a le system on the transactional volume later. Likewise, clearing a transactional volume leaves the UFS le system on the master device intact. After you congure a transactional volume, you can use it as though it were a physical slice or another logical volume. For information about creating a transactional volume, see Creating Transactional Volumes on page 183.
ExampleTransactional Volume
The following gure shows a transactional volume, d1, which consists of a master device, d3, and a mirrored log device, d30.
173
Master Device Volume d3 interlace 1 interlace 2 interlace 3 interlace 4 Solaris Volume Manager RAID 0 (Concatenation) Volume interlace 1 interlace 2 interlace 3 interlace 4 Logging Data
FIGURE 171
174
Logging Data
FIGURE 172
175
Before you create transactional volumes, identify the slices or volume to be used as the master devices and log devices. Log any UFS le system except root (/). Use a mirrored log device for data redundancy. Do not place logs on heavily used disks. Plan for a minimum of 1 Mbyte of storage space for logs. (Larger logs permit more simultaneous le system transactions.) Plan on using an additional 1 Mbyte of log space per 100 Mbytes of le system data, up to a maximum recommended log size of 64 Mbytes. Although the maximum possible log size is 1 Gbyte, logs larger than 64 Mbytes are rarely fully used and often waste storage space. The log device and the master device of the same transactional volume should be located on separate drives and possibly separate controllers to help balance the I/O load. Transactional volumes can share log devices. However, heavily used le systems should have separate logs. The disadvantage of sharing a log device is that certain errors require that all le systems that share the log device must be checked with the fsck command. Once you set up a transactional volume, you can share the log device among le systems. Logs (log devices) are typically accessed frequently. For best performance, avoid placing logs on disks with high usage. You might also want to place logs in the middle of a disk, to minimize the average seek times when accessing the log. The larger the log size, the better the performance. Larger logs allow for greater concurrency (more simultaneous le system operations per second).
I I I I
Note Mirroring log devices is strongly recommended. Losing the data in a log device because of device errors can leave a le system in an inconsistent state that fsck might be unable to x without user intervention. Using a RAID 1 volume for the master device is a good idea to ensure data redundancy.
Generally, you should log your largest UFS le systems and the UFS le system whose data changes most often. It is probably not necessary to log small le systems with mostly read-only activity.
176
If no slice is available for the log device, you can still congure a transactional volume. This strategy might be useful if you plan to log exported le systems when you do not have a spare slice for the log device. When a slice is available, you only need to attach it as a log device. Consider sharing a log device among le systems if your system does not have many available slices, or if the le systems sharing the log device are primarily read, not written.
Caution When one master device of a shared log device goes into a failed state, the log device is unable to roll its changes forward. This problem causes all master devices sharing the log device to go into the hard error state.
Device, which is the device name of the slice or volume Start Block, which is the block on which the device begins Dbase, which shows if the device contains a state database replica State, which shows the state of the log device
The following table explains transactional volume states and possible actions to take.
TABLE 171 State
Okay
The device is functioning properly. None. If mounted, the le system is logging and will not be checked at boot. The log device will be attached to the transactional volume when the volume is closed or unmounted. When this occurs, the device transitions to the Okay state. None.
Attaching
177
(Continued)
Action
Detached
The transactional volume does not The fsck command automatically checks have a log device. All benets from the device at boot time. See the fsck(1M) UFS logging are disabled. man page. The log device will be detached from the transactional volume when the volume is closed or unmounted. When this occurs, the device transitions to the Detached state. A device error or panic has occurred while the device was in use. An I/O error is returned for every read or write until the device is closed or unmounted. The rst open causes the device to transition to the Error state. The device can be read and written to. The le system can be mounted read-only. However, an I/O error is returned for every read or write that actually gets a device error. The device does not transition back to the Hard Error state, even when a later device error occurs. None.
Detaching
Hard Error
Fix the transactional volume. See How to Recover a Transactional Volume With a Panic on page 199, or How to Recover a Transactional Volume With Hard Errors on page 200.
Error
Fix the transactional volume. See How to Recover a Transactional Volume With a Panic on page 199, or How to Recover a Transactional Volume With Hard Errors on page 200. Successfully completing the fsck or newfs commands transitions the device into the Okay state. When the device is in the Hard Error or Error state, the fsck command automatically checks and repairs the le system at boot time. The newfs command destroys whatever data might be on the device.
ScenarioTransactional Volumes
Transactional volumes provide logging capabilities for UFS le systems, similar to UFS Logging. The following example, drawing on the sample system explained in Chapter 4, describes how transactional volumes can help speed reboot by providing le system logging.
178
Note Unless your situation requires the special capabilities of transactional volumes, specically the ability to log to a different device than the logged device, consider using UFS logging instead. UFS logging provides superior performance to transactional volumes.
The sample system has several logical volumes that should be logged to provide maximum uptime and availability, including the root (/) and /var mirrors. By conguring transactional volumes to log to a third RAID 1 volume, you can provide redundancy and speed the reboot process.
179
180
CHAPTER
18
Task
Description
Instructions
Use the Solaris Volume Manager GUI How to Create a or the metainit command to create a Transactional Volume transactional volume. on page 183 Use the metaclear and mount commands to clear a transactional volume and mount the le system with UFS logging. How to Convert a Transactional Volume to UFS Logging on page 186
181
Task
Description
Instructions
Check the status of transactional volumes Attach a log device to a transactional volume
Use the Solaris Volume Manager GUI or the metastat command to check the status of a transactional volume.
Use the Solaris Volume Manager GUI How to Attach a Log or the metattach command to attach Device to a Transactional a log device. Volume on page 191 How to Detach a Log Device from a Transactional Volume on page 192 How to Expand a Transactional Volume on page 192 How to Remove a Transactional Volume on page 194 How to Remove a Transactional Volume and Retain the Mount Device on page 195 How to Share a Log Device Among File Systems on page 198 How to Recover a Transactional Volume With a Panic on page 199
Detach a log device from Use the Solaris Volume Manager GUI a transactional volume or the metadetach command to detach a log device. Expand a transactional volume Delete a transactional volume Use the Solaris Volume Manager GUI or the metattach command to expand a transactional volume. Use the Solaris Volume Manager GUI, the metadetach command, or the metarename command to delete a transactional volume. Use the Solaris Volume Manager GUI or the metadetach command to delete a transactional volume. Use the Solaris Volume Manager GUI or the metainit command to share a transactional volume log device. Use the fsck command to recover a transactional volume with a panic.
Delete a transactional volume and retain the mount point Share a log device
Recover a transactional Use the fsck command to recover a How to Recover a volume with hard errors transactional volume with hard errors. Transactional Volume With Hard Errors on page 200
182
Caution Solaris Volume Manager transactional volumes do not support large (greater than 1TB) volumes. In all cases, UFS logging (see mount_ufs(1M)) provides better performance than transactional volumes, and UFS logging does support large volumes as well. See Overview of Large Volume Support in Solaris Volume Manager on page 47 for more information about large volume support in Solaris Volume Manager.
Note If the le system cannot be unmounted, you can continue, but will have to reboot the system before the transactional volume can be active.
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then choose Action->Create Volume and follow the instructions in the wizard. For more information, see the online help. Use the following form of the metainit command:
metainit trans-volume -t master-device log-device
I
master-device is the name of the device containing the le system you want to log. log-device is the name of the device that will contain the log.
The master device and log device can be either slices or logical volumes. See the metainit(1M) man page for more information. For example, to create a transactional volume (d10) logging the le system on slice c0t0d0s6 to a log on c0t0d0s7, use the following syntax:
# metainit d10 -t c0t0d0s6 c0t0d0s7
Note You can use the same log device (c0t0d0s7 in this example) for several
4. Edit the /etc/vfstab le so that the existing UFS le system information is replaced with that of the created transactional volume. For example, if /export was on c0t0d0s6, and the new transactional volume is d10, edit /etc/vfstab as shown here, so the mount points to the transactional volume rather than to the raw disk slice:
#/dev/dsk/c0t0d0s6 /dev/rdsk/c0t0d0s6 /export /dev/md/dsk/d10 /dev/md/rdsk/d10 /export ufs ufs 2 2 yes yes -
The slice /dev/dsk/c0t2d0s2 contains a le system mounted on /home1. The slice that will contain the log device is /dev/dsk/c2t2d0s1. First, the le system is unmounted. The metainit command with the -t option creates the transactional volume, d63. Next, the /etc/vfstab le must be edited to change the entry for the le system to reference the transactional volume. For example, the following line:
184 Solaris Volume Manager Administration Guide April 2004
Logging becomes effective for the le system when it is remounted. On subsequent reboots, instead of checking the le system, the fsck command displays a log message for the transactional volume:
# reboot ... /dev/md/rdsk/d63: is logging
Slice /dev/dsk/c0t3d0s6 contains the /usr le system. The slice that will contain the log device is /dev/dsk/c1t2d0s1. Because /usr cannot be unmounted, the metainit command is run with the -f option to force the creation of the transactional volume, d20. Next, the line in the /etc/vfstab le that mounts the le system must be changed to reference the transactional volume. For example, the following line:
/dev/dsk/c0t3d0s6 /dev/rdsk/c0t3d0s6 /usr ufs 1 no -
Logging becomes effective for the le system when the system is rebooted.
RAID 1 volume d30 contains a le system that is mounted on /home1. The mirror that will contain the log device is d12. First, the le system is unmounted. The metainit command with the -t option creates the transactional volume, d64. Next, the line in the /etc/vfstab le that mounts the le system must be changed to reference the transactional volume. For example, the following line:
Chapter 18 Transactional Volumes (Tasks) 185
Logging becomes effective for the le system when the le system is remounted. On subsequent le system remounts or system reboots, instead of checking the le system, the fsck command displays a log message for the transactional volume:
# reboot ... /dev/md/rdsk/d64: is logging
To avoid editing the /etc/vfstab le, you can use the metarename(1M) command to exchange the name of the original logical volume and the new transactional volume. For more information, see Renaming Volumes on page 232.
# metastat d2: Trans State: Okay Size: 2869209 blocks Master Device: d0 Logging Device: d20 d20: Logging device for d2 State: Okay Size: 28470 blocks d20: Concat/Stripe Size: 28728 blocks Stripe 0: (interlace: 32 blocks) Device Start Block d10 0 d11 0 d12
Reloc No No
Hot Spare
Note the names for these devices for later use. 2. Check to see if the Trans device is currently mounted by using the df command and searching for the name of the transactional volume in the output. If the transactional volume is not mounted, go to Step 7.
# df | grep d2 /mnt/transvolume (/dev/md/dsk/d2 ): 2782756 blocks 339196 files
3. Verify adequate free space on the transactional volume by using the df -k command.
# df -k /mnt/transvolume file system kbytes /dev/md/dsk/d2 1391387 used avail capacity Mounted on 91965 1243767 7% /mnt/transvolume
4. Stop all activity on the le system, either by halting applications or bringing the system to the single user mode.
# init s [root@lexicon:lexicon-setup]$ init s INIT: New run level: S The system is coming down for administration. Please wait. Dec 11 08:14:43 lexicon syslogd: going down on signal 15 Killing user processes: done. INIT: SINGLE USER MODE Type control-d to proceed with normal startup, (or give root password for system maintenance): single-user privilege assigned to /dev/console. Entering System Maintenance Mode Dec 11 08:15:52 su: su root succeeded for root on /dev/console Sun Microsystems Inc. SunOS 5.9 s81_51 May 2002 # Chapter 18 Transactional Volumes (Tasks) 187
5. Flush the log for the le system that is logged with lockfs -f.
# /usr/sbin/lockfs -f /mnt/transvolume
7. Clear the transactional volume that contains the le system. This operation will not affect the data on the le system.
# metaclear d2 d2: Trans is cleared
The Logging device, identied at the beginning of this procedure, is now unused and can be reused for other purposes. The master device, also identied at the beginning of this procedure, contains the le system and must be mounted for use. 8. Edit the /etc/vfstab le to update the mount information for the le system. You must change the raw and block mount points, and add logging to the options for that le system. With the transactional volume in use, the /etc/vfstab entry looks like this:
/dev/md/dsk/d2 /dev/md/rdsk/d2 /mnt/transvolume ufs 1 no -
After you update the le to change the mount point from the transactional volume d2 to the underlying device d0, and add the logging option, that part of the /etc/vfstab le looks like this:
#device device mount FS fsck mount mount #to mount to fsck point type pass at boot options # /dev/md/dsk/d0 /dev/md/rdsk/d0 /mnt/transvolume ufs 1 no logging
Note The mount command might report an error, similar to the state of
/dev/md/dsk/d0 is not okay and it was attempted to be mounted read/write. Please run fsck and try again. If this happens, run fsck on the raw device (fsck /dev/md/rdsk/d0 in this case), answer y to xing the le system state in the superblock, and try again.
10. Verify that the le system is mounted with logging enabled by examining the /etc/mnttab le and conrming that the le system has logging listed as one of the options.
188 Solaris Volume Manager Administration Guide April 2004
# grep mnt /etc/mnttab mnttab /etc/mnttab mntfs dev=43c0000 1007575477 /dev/md/dsk/d0 /mnt/transvolume ufs rw,intr,largefiles, logging,xattr,onerror=panic,suid,dev=1540000 1008085006
11. If you changed to single-user mode during the process, you can now return to multiuser mode.
c1t12d0s0: Logging device for d50 State: Okay Size: 30269 blocks Logging Device c1t12d0s0 Start Block 5641 Dbase Reloc No Yes
Make note of the master and log devices as you will need this information in subsequent steps. Determine if the transactional volume contains a mounted le system. # df | grep d50 /home1
(/dev/md/dsk/d50
):
161710 blocks
53701 files
Verify sufficient free space (more than 1 MByte) # df -k /home1 filesystem kbytes used /dev/md/dsk/d50 95510 14655 Go to single-user mode.
Mounted on /home1
# /usr/sbin/lockfs -f /home1 # /usr/sbin/umount /home1 # /usr/sbin/metaclear d50 d50: Trans is cleared Update /etc/vfstab le to mount underlying volume and add logging option.
189
FS
fsck
# mount /home1 # /usr/bin/grep /home1 /etc/mnttab /dev/dsk/c1t14d0s0 /home1 ufs rw,intr,largefiles,logging,xattr,onerror=panic,suid,dev=740380 1008019906 Return to multi-user mode.
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then view the status of the volumes. Right-click a transactional volume and choose Properties for more detailed status information. For more information, see the online help. Use the metastat command. For more information, see the metastat(1M) man page.
190
c0t2d0s3: Logging device for d0 State: Okay Size: 5350 blocks Logging Device c0t2d0s3 Start Block 250 Dbase No
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then choose the transactional volume from the listing. Right-click the volume, and choose Properties. For more information, see the online help. Use the following form of the metattach command:
metattach master-volume logging-volume
master-volume is the name of the transactional volume that contains the le system to be logged. logging-volume is the name of the volume or slice that should contain the log. See the metattach(1M) man page for more information.
# metattach d1 d23
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then choose the transactional volume from the listing. Right-click the volume, and choose Properties. For more information, see the online help. Use the following form of the metadetach command:
metadetach master-volume
master-volume is the name of the transactional volume that contains the le system that is being logged. See the metadetach(1M) man page for more information. 4. Remount the le system.
192
1. Check Prerequisites for Creating Solaris Volume Manager Components on page 46 and Background Information for Transactional Volumes on page 175. 2. If the master device is a volume (rather than a basic slice), attach additional slices to the master device by using one of the following methods:
I
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then choose the transactional volume from the listing. Right-click the volume, and choose Properties, then the Components panel. For more information, see the online help. Use the following form of the metattach command:
metattach master-volume component
master-volume is the name of the transactional volume that contains the le system to be logged. component is the name of the volume or slice that should be attached. See the metattach(1M) man page for more information.
Note If the master device is a mirror, you need to attach additional slices to each submirror.
3. If the master device is a slice, you cannot expand it directly. Instead, you must do the following:
I I I
Clear the existing transactional volume. Put the master devices slice into a volume. Recreate the transactional volume.
Once you have completed this process, you can expand the master device as explained in the previous steps of this procedure.
Submirror 1: d12 State: Okay ... # metattach d11 c0t2d0s5 d11: component is attached # metattach d12 c0t3d0s5 d12: component is attached
This example shows the expansion of a transactional device, d10, whose master device consists of a two-way RAID 1 volume, d0, which contains two submirrors, d11 and d12. The metattach command is run on each submirror. The system conrms that each slice was attached.
3. Detach the log device from the transactional volume by using one of the following methods:
I
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then choose the transactional volume from the listing. Right-click the volume, and choose Properties. For more information, see the online help. Use the following form of the metadetach command:
metadetach master-volume
master-volume is the name of the transactional volume that contains the le system that is being logged. See the metadetach(1M) man page for more information. 4. Remove (clear) the transactional volume by using one of the following methods:
194 Solaris Volume Manager Administration Guide April 2004
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then choose the transactional volume from the listing. Right-click the volume, and choose Delete. For more information, see the online help. Use the following form of the metaclear command:
metaclear master-volume
See the metaclear(1M) man page for more information. 5. If necessary, update /etc/vfstab to mount the underlying volume, rather than the transactional volume you just cleared. 6. Remount the le system.
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then choose the transactional volume from the listing. Right-click the volume, and choose Properties. For more information, see the online help.
Chapter 18 Transactional Volumes (Tasks) 195
master-volume is the name of the transactional volume that contains the le system that is being logged. See the metadetach(1M) man page for more information. 4. Exchange the name of the transactional volume with that of the master device. 5. Remove (clear) the transactional volume by using one of the following methods:
I
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then choose the transactional volume from the listing. Right-click the volume, and choose Delete. For more information, see the online help. Use the following form of the metaclear command:
metaclear master-volume
See the metaclear(1M) man page for more information. 6. Run the fsck command on the master device. When asked whether to x the le systems state in the superblock, respond y. 7. Remount the le system.
Size: 5350 blocks # umount /fs2 # metadetach d1 d1: log device d0 is detached # metarename -f -x d1 d21 d1 and d21 have exchanged identities # metastat d21 d21: Trans State: Detached Size: 5600 blocks Master Device: d1 d1: Mirror Submirror 0: d20 State: Okay Submirror 1: d2 State: Okay # metaclear 21 # fsck /dev/md/dsk/d1 ** /dev/md/dsk/d1 ** Last Mounted on /fs2 ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups FILE SYSTEM STATE IN SUPERBLOCK IS WRONG; FIX? y 3 files, 10 used, 2493 free (13 frags, 310 blocks, 0.5% fragmentation) # mount /fs2
The metastat command conrms that the transactional volume, d1, is in the Okay state. The le system is unmounted before detaching the transactional volumes log device. The transactional volume and its mirrored master device are exchanged by using the -f (force) ag. Running the metastat command again conrms that the exchange occurred. The transactional volume and the log device (if desired) are cleared, in this case, d21 and d0, respectively. Next, the fsck command is run on the mirror, d1, and the prompt is answered with a y. After the fsck command is done, the le system is remounted. Note that because the mount device for /fs2 did not change, the /etc/vfstab le does not require editing.
197
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then choose the transactional volume from the listing. Right-click the volume, and choose Properties. For more information, see the online help. Use the following form of the metadetach command:
metadetach master-volume
See the metadetach(1M) man page for more information. 4. Attach a log device to the transactional volume by using one of the following methods:
I
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then choose the transactional volume from the listing. Right-click the volume, and choose Properties. For more information, see the online help. Use the following form of the metattach command:
metattach master-volume logging-volume
See the metattach(1M) man page for more information. 5. Edit the /etc/vfstab le to modify (or add) the entry for the le system to reference the transactional volume. 6. Remount the le system. If the le system cannot be unmounted, reboot the system to force your changes to take effect.
198
This example shows the sharing of a log device (d10) dened as the log for a previous transactional volume, with a new transactional volume (d64). The le system to be set up as the master device is /xyzfs and is using slice /dev/dsk/c0t2d0s4. The metainit -t command species the conguration is a transactional volume. The /etc/vfstab le must be edited to change (or enter for the rst time) the entry for the le system to reference the transactional volume. For example, the following line:
/dev/dsk/c0t2d0s4 /dev/rdsk/c0t2d0s4 /xyzfs ufs 2 yes -
The metastat command veries that the log is being shared. Logging becomes effective for the le system when the system is rebooted. Upon subsequent reboots, instead of checking the le system, the fsck command displays these messages for the two le systems:
/dev/md/rdsk/d63: is logging. /dev/md/rdsk/d64: is logging.
on each transactional volume whose le systems share the affected log device.
199
Only after all of the affected transactional volumes have been checked and successfully repaired will the fsck command reset the state of the failed transactional volume to Okay.
1. Check Prerequisites for Creating Solaris Volume Manager Components on page 46 and Background Information for Transactional Volumes on page 175. 2. Read Background Information for Transactional Volumes on page 175. 3. Run the lockfs command to determine which le systems are locked.
# lockfs
Affected le systems are listed with a lock type of hard. Every le system that shares the same log device would be hard locked. 4. Unmount the affected le system(s). You can unmount locked le systems even if they were in use when the error occurred. If the affected processes try to access an opened le or directory on the hard locked or unmounted le system, an error is returned. 5. (Optional) Back up any accessible data. Before you attempt to x the device error, you might want to recover as much data as possible. If your backup procedure requires a mounted le system (such as the
200 Solaris Volume Manager Administration Guide April 2004
tar command or the cpio command), you can mount the le system read-only. If your backup procedure does not require a mounted le system (such as the dump command or the volcopy command), you can access the transactional volume directly. 6. Fix the device error. At this point, any attempt to open or mount the transactional volume for read-and-write access starts rolling all accessible data on the log device to the appropriate master devices. Any data that cannot be read or written is discarded. However, if you open or mount the transactional volume for read-only access, the log is simply rescanned and not rolled forward to the master devices, and the error is not xed. In other words, all data on the master device and log device remains unchanged until the rst read or write open or mount. 7. Run the fsck command to repair the le system, or the newfs command if you need to restore data. Run the fsck command on all of the transactional volumes that share the same log device. When all transactional volumes have been repaired by the fsck command, they then revert to the Okay state. The newfs command will also transition the le system back to the Okay state, but the command will destroy all of the data on the le system. The newfs command is generally used when you plan to restore le systems from backup. The fsck or newfs commands must be run on all of the transactional volumes that share the same log device before these devices revert back to the Okay state. 8. Run the metastat command to verify that the state of the affected devices has reverted to Okay.
** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups WARNING: md: log device: /dev/dsk/c0t0d0s6 changed state to Okay 4 files, 11 used, 4452 free (20 frags, 554 blocks, 0.4% fragmentation) # metastat d5 d5: Trans State: Okay Size: 10080 blocks Master Device: d4 Logging Device: c0t0d0s6 d4: Mirror State: Okay ... c0t0d0s6: Logging device for d5 State: Okay ...
This example shows a transactional volume, d5, which has a log device in the Hard Error state, being xed. You must run the fsck command on the transactional volume itself, which transitions the state of the transactional volume to Okay. The metastat command conrms that the state is Okay.
202
CHAPTER
19
What Do Disk Sets Do? on page 203 How Does Solaris Volume Manager Manage Disk Sets? on page 204 Background Information for Disk Sets on page 208 Administering Disk Sets on page 209 ScenarioDisk Sets on page 211
203
Note Disk sets are intended, in part, for use with Sun Cluster, Solstice HA (High Availability), or another supported third-party HA framework. Solaris Volume Manager by itself does not provide all the functionality necessary to implement a failover conguration.
204
Note Although disk sets are supported in single-host congurations, they are often not appropriate for local (not dual-connected) use. Two common exceptions are the use of disk sets to provide a more managable namespace for logical volumes, and to more easily manage storage on a Storage Area Network (SAN) fabric (see ScenarioDisk Sets on page 211).
For use in disk sets, disks must have a slice seven that meets these criteria:
I I I I
Starts at sector 0 Includes enough space for disk label and state database replicas Cannot be mounted Does not overlap with any other slices, including slice two
If the existing partition table does not meet these criteria, Solaris Volume Manager will repartition the disk. A small portion of each drive is reserved in slice 7 for use by Solaris Volume Manager. The remainder of the space on each drive is placed into slice 0. Any existing data on the disks is lost by repartitioning.
Tip After you add a drive to a disk set, you may repartition it as necessary, with the exception that slice 7 is not altered in any way.
The minimum size for slice seven is variable, based on disk geometry, but is always equal to or greater than 4MB. The following output from the prtvtoc command shows a disk before it is added to a disk set.
[root@lexicon:apps]$ prtvtoc /dev/rdsk/c1t6d0s0 * /dev/rdsk/c1t6d0s0 partition map Chapter 19 Disk Sets (Overview) 205
* * Dimensions: * 512 bytes/sector * 133 sectors/track * 27 tracks/cylinder * 3591 sectors/cylinder * 4926 cylinders * 4924 accessible cylinders * * Flags: * 1: unmountable * 10: read-only * * First Sector Last * Partition Tag Flags Sector Count Sector 0 2 00 0 4111695 4111694 1 3 01 4111695 1235304 5346998 2 5 01 0 17682084 17682083 3 0 00 5346999 4197879 9544877 4 0 00 9544878 4197879 13742756 5 0 00 13742757 3939327 17682083
Mount Directory
Note If you have disk sets that you upgraded from Solstice DiskSuite software, the default state database replica size on those sets will be 1034 blocks, not the 8192 block size from Solaris Volume Manager. Also, slice 7 on the disks that were added under Solstice DiskSuite will be correspondingly smaller than slice 7 on disks that were added under Solaris Volume Manager.
After you add the disk to a disk set, the output of prtvtoc looks like the following:
[root@lexicon:apps]$ prtvtoc /dev/rdsk/c1t6d0s0 * /dev/rdsk/c1t6d0s0 partition map * * Dimensions: * 512 bytes/sector * 133 sectors/track * 27 tracks/cylinder * 3591 sectors/cylinder * 4926 cylinders * 4924 accessible cylinders * * Flags: * 1: unmountable * 10: read-only * * First Sector Last * Partition Tag Flags Sector Count Sector 0 0 00 10773 17671311 17682083 7 0 01 0 10773 10772 [root@lexicon:apps]$
Mount Directory
If disks you add to a disk set have acceptable slice 7s (that start at cylinder 0 and that have sufficient space for the state database replica), they will not be reformatted.
206 Solaris Volume Manager Administration Guide April 2004
Volume path names include the disk set name after /dev/md/ and before the actual volume name in the path. The following table shows some example disk set volume names.
Example Volume Names
Block volume d0 in disk set blue Block volume d1 in disk set blue Raw volume d126 in disk set blue Raw volume d127 in disk set blue
TABLE 191
Similarly, hot spare pools have the disk set name as part of the hot spare name.
207
FIGURE 191
Solaris Volume Manager must be congured on each host that will be connected to the disk set. Each host must have its local state database set up before you can create disk sets. To create and work with a disk set in a clustering environment, root must be a member of Group 14, or the /.rhosts le must contain an entry for the other host name (on each host). To perform maintenance on a disk set, a host must be the owner of the disk set or have reserved the disk set. (A host takes implicit ownership of the disk set by putting the rst drives into the set.) You cannot add a drive that is in use to a disk set. Before you add a drive, make sure it is not currently being used for a le system, database, or any other application.
I I
208
Do not add a drive with existing data that you want to preserve to a disk set. The process of adding the disk to the disk set repartitions the disk and destroys existing data. All disks that you plan to share between hosts in the disk set must be connected to each host and must have the exact same path, driver, and name on each host. Specically, a shared disk drive must be seen on both hosts at the same device number (c#t#d#). If the numbers are not the same on both hosts, you will see the message drive c#t#d# is not common with host xxx when attempting to add drives to the disk set. The shared disks must use the same driver name (ssd). See How to Add Drives to a Disk Set on page 215 for more information on setting up shared disk drives in a disk set.
The default total number of disk sets on a system is 4. You can increase this value up to 32 by editing the /kernel/drv/md.conf le, as described in How to Increase the Number of Default Disk Sets on page 238. The number of shared disk sets is always one less than the md_nsets value, because the local set is included in md_nsets. Unlike local volume administration, it is not necessary to create or delete state database replicas manually on the disk set. Solaris Volume Manager tries to balance a reasonable number of replicas across all drives in a disk set. When drives are added to a disk set, Solaris Volume Manager re-balances the state database replicas across the remaining drives. Later, if necessary, you can change the replica layout with the metadb command.
209
Safely - When you safely reserve a disk set, Solaris Volume Manager attempts to take the disk set, and the other host attempts to release the disk set. The release (and therefore the reservation) might fail. Forcibly - When you forcibly reserve a disk set, Solaris Volume Manager reserves the disk set whether or not another host currently has the set reserved. This method is generally used when a host in the disk set is down or not communicating. All disks within the disk set are taken over. The state database is read in on the host performing the reservation and the shared volumes congured in the disk set become accessible. If the other host had the disk set reserved at this point, it would panic due to reservation loss. Normally, two hosts in a disk set cooperate with each other to ensure that drives in a disk set are reserved by only one host at a time. A normal situation is dened as both hosts being up and communicating with each other.
Note If a drive has been determined unexpectedly not to be reserved (perhaps because another host using the disk set forcibly took the drive), the host will panic. This behavior helps to minimize data loss which would occur if two hosts were to simultaneously access the same drive.
For more information about taking or reserving a disk set, see How to Take a Disk Set on page 221.
210
ScenarioDisk Sets
The following example, drawing on the sample system shown in Chapter 4, describes how disk sets should be used to manage storage that resides on a SAN (Storage Area Network) fabric. Assume that the sample system has an additional controller that connects to a ber switch and SAN storage. Storage on the SAN fabric is not available to the system as early in the boot process as other devices, such as SCSI and IDE disks, and Solaris Volume Manager would report logical volumes on the fabric as unavailable at boot. However, by adding the storage to a disk set, and then using the disk set tools to manage the storage, this problem with boot time availability is avoided (and the fabric-attached storage can be easily managed within a separate, disk set controlled, namespace from the local storage).
211
212
CHAPTER
20
Task
Description
Instructions
Use the Solaris Volume Manager GUI or the metaset command to create a disk set. Use the Solaris Volume Manager GUI or the metaset command to add drives to a disk set. Use the Solaris Volume Manager GUI or the metaset command to add a host to a disk set. Use the Solaris Volume Manager GUI or the metainit command to create volumes in a disk set.
How to Create a Disk Set on page 214 How to Add Drives to a Disk Set on page 215 How to Add a Host to a Disk Set on page 217 How to Create Solaris Volume Manager Components in a Disk Set on page 218
213
Task
Description
Instructions
Use the Solaris Volume Manager GUI or the metaset and metastat commands to check the status of a disk set. Use the Solaris Volume Manager GUI or the metaset command to remove drives from a disk set. Use the Solaris Volume Manager GUI or the metaset command to take a disk set. Use the Solaris Volume Manager GUI or the metaset command to release a disk set.
How to Remove Disks from a Disk Set on page 220 How to Take a Disk Set on page 221 How to Release a Disk Set on page 222 How to Delete a Host or Disk Set on page 224
Delete a host from a disk Use the Solaris Volume Manager GUI set or the metaset command to delete hosts from a disk set.
Use the Solaris Volume Manager GUI or the metaset command to delete the last host from a disk set, thus deleting the disk set.
From the Enhanced Storage tool within the Solaris Management Console, open the Disk Sets node. Choose Action->Create Disk Set, then follow the instructions in the wizard. For more information, see the online help. To create a disk set from scratch from the command line, use the following form of the metaset command: metaset [-sdiskset-name] [-a] [-hhostname]
214
-s diskset-name -a -h hostname
Species the name of a disk set on which the metaset command will work. Adds hosts to the named disk set. Solaris Volume Manager supports four hosts per disk set. Species one or more hosts to be added to a disk set. Adding the rst host creates the set. The second host can be added later, but it is not accepted if all the drives within the set cannot be found on the specied hostname. hostname is the same name found in the /etc/nodename le.
See metaset(1M) for more information. 3. Check the status of the new disk set by using the metaset command.
# metaset
In this example, you create a shared disk set called blue, from the host lexicon. The metaset command shows the status. At this point, the set has no owner. The host that adds disks to the set will become the owner by default.
215
Only drives that meet the following conditions can be added to a disk set:
I
The drive must not be in use in a volume or hot spare pool, or contain a state database replica. The drive must not be currently mounted, swapped on, or otherwise opened for use by an application.
1. Check Background Information for Disk Sets on page 208. 2. To add drives to a disk set, use one of the following methods:
I
From the Enhanced Storage tool within the Solaris Management Console, open the Disk Sets node. Select the disk set you want to modify, then right-click and choose Properties. Select the Disks tab, click Add Disk, then follow the instructions in the wizard. For more information, see the online help. To add drives to a disk set from the command line, use the following form of the metaset command: metaset [-s diskset-name] [a] [disk-name] -s diskset-name -a disk-name Species the name of a disk set on which the metaset command will work. Adds drives to the named disk set. Species the drives to add to the disk set. Drive names are in the form cxtxdx; no sx slice identiers are at the end of the name. They need to be the same as seen from all hosts in the disk set.
See the metaset man page (metaset(1M)) for more information. The rst host to add a drive to a disk set becomes the owner of the disk set.
Caution Do not add a disk with data; the process of adding it to the disk set might repartition the disk, destroying any data. For more information, see ExampleTwo Shared Disk Sets on page 207.
3. Use the metaset command to verify the status of the disk set and drives.
# metaset
216
In this example, the host name is lexicon. The shared disk set is blue. At this point, only one disk has been added to the disk set blue. Optionally, you could add multiple disks at once by listing each of them on the command line. For example, you could use the following:
# metaset -s blue -a c1t6d0 c2t6d0
From the Enhanced Storage tool within the Solaris Management Console, open the Disk Sets node and choose the disk set you want to modify. Select the disk set you want to modify, then right-click and choose Properties. Select the Hosts tab, click Add Host, then follow the instructions in the wizard. For more information, see the online help. To add hosts to a disk set from the command line, use the following form of the metaset command: metaset [-s diskset-name] [-a] [-h hostname] -s diskset-name -a -h hostname Species the name of a disk set on which metaset will work. Adds drives to the named disk set. Species one or more host names to be added to the disk set. Adding the rst host creates the set. The host name is the same name found in the /etc/nodename le.
See the metaset man page (metaset(1M)) for more information. 3. Verify that the host has been added to the disk set by using the metaset command without any options.
# metaset
217
This example shows the addition of host idiom to the disk set blue.
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes, State Database Replicas, or Hot Spare Pools node. Choose Action->Create, then follow the instructions in the wizard. For more information, see the online help. Use the command line utilities with the same basic syntax you would without a disk set, but add -s diskset-name immediately after the command for every command.
218
# metastat -s blue blue/d10: Mirror Submirror 0: blue/d11 State: Okay Submirror 1: blue/d12 State: Resyncing Resync in progress: 0 % done Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 17674902 blocks blue/d11: Submirror of blue/d10 State: Okay Size: 17674902 blocks Stripe 0: Device Start Block c1t6d0s0 0
Reloc
Hot Spare
blue/d12: Submirror of blue/d10 State: Resyncing Size: 17674902 blocks Stripe 0: Device Start Block c2t6d0s0 0
Reloc
Hot Spare
This example shows the creation of a mirror, d10, in disk set blue, that consists of submirrors (RAID 0 devices) d11 and d12.
From the Enhanced Storage tool within the Solaris Management Console, open the Disk Sets node. Right-click the Disk Set you want to monitor, then choose Properties from the menu. For more information, see the online help. Use the metaset command to view disk set status. See metaset(1M) for more information.
219
Set name = blue, Set number = 1 Host idiom Drive c1t6d0 c2t6d0 Owner Yes Dbase Yes Yes
The metaset command with the -s option followed by the name of the blue disk set displays status information for that disk set. By issuing the metaset command from the owning host, idiom, it is determined that idiom is in fact the disk set owner. The metaset command also displays the drives in the disk set. The metaset command by itself displays the status of all disk sets.
From the Enhanced Storage tool within the Solaris Management Console, open the Disk Sets node. Right-click the Disk Set you want to release, then choose Properties from the menu. Click the Disks tab and follow the instructions in the online help. Use the following form of the metaset command:
metaset -s diskset-name-d drivename
-s diskset-name drive-name
Species the name of a disk set on which the metaset command will work. Species the drives to delete from the disk set. Drive names are in the form cxtxdx; no sx slice identiers are at the end of the name.
diskset-name command.
220
# metaset -s blue
Dbase Yes
This example deletes the disk from the disk set blue.
From the Enhanced Storage tool within the Solaris Management Console, open the Disk Sets node. Right-click the Disk Set you want to take, then choose Take Ownership from the menu. For more information, see the online help. Use the following form of the metaset command.
metaset -s diskset-name-t
-s diskset-name -t -f
Species the name of a disk set on which the metaset command will work. Species to take the disk set. Species to take the disk set forcibly.
See the metaset(1M) man page for more information. When one host in a disk set takes the disk set, the other host in the disk set cannot access data on drives in the disk set. The default behavior of the metaset command takes the disk set for your host only if a release is possible on the other host. Use the -f option to forcibly take the disk set. This option takes the disk set whether or not another host currently has the set. Use this method when a host in the disk set is down or not communicating. If the other host had the disk set taken at this point, it would panic when it attempts to perform an I/O operation to the disk set.
221
In this example, host lexicon communicates with host idiom and ensures that host idiom has released the disk set before host lexicon attempts to take the set. In this example, if host idiom owned the set blue, the Owner column in the above output would still have been blank. The metaset command only shows whether the issuing host owns the disk set, and not the other host.
In this example, the host that is taking the disk set does not communicate with the other host. Instead, the drives in the disk set are taken without warning. If the other host had the disk set, it would panic when it attempts an I/O operation to the disk set.
222
1. Check Background Information for Disk Sets on page 208. 2. Use one of the following methods to release a disk set.
I
From the Enhanced Storage tool within the Solaris Management Console, open the Disk Sets node. Right-click the Disk Set you want to release, then choose Release Ownership from the menu. For more information, see the online help. To release ownership of the disk set, use the following form of the metaset command.
metaset -s diskset-name-r
-s diskset-name -r
Species the name of a disk set on which the metaset command will work. Releases ownership of a disk set. The reservation of all the disks within the disk set is removed. The volumes within the disk set are no longer accessible.
3. Verify that the disk set has been released on this host by using the metaset command without any options.
# metaset
223
This example shows the release of the disk set blue. Note that there is no owner of the disk set. Viewing status from host lexicon could be misleading. A host can only determine if it does or does not own a disk set. For example, if host idiom were to reserve the disk set, it would not appear so from host lexicon. Only host idiom would be able to determine the reservation in this case.
From the Enhanced Storage tool within the Solaris Management Console, open the Disk Sets node. Right-click the Disk Set you want to release, then choose Delete from the menu. Follow the instructions in the online help. To delete the host and remove the disk set if the host removed is the last host on the disk set, use the following form of the metaset command.
metaset -s diskset-name -d -h hostname
-s diskset-name -d -h hostname
Species the name of a disk set on which the metaset command will work. Deletes a host from a disk set. Species the name of the host to delete.
See the metasetmetaset(1M) man page for more information. 2. Verify that the host has been deleted from the disk set by using the metaset command. Note that only the current (owning) host is shown. Other hosts have been deleted.
# metaset -s blue Set name = blue, Set number = 1 Host lexicon Drive c1t2d0 c1t3d0 c1t4d0 c1t5d0 c1t6d0 c2t1d0 Owner Yes Dbase Yes Yes Yes Yes Yes Yes
224
This example shows the deletion of the last host from the disk set blue.
225
226
CHAPTER
21
Solaris Volume Manager Maintenance (Task Map) on page 227 Viewing the Solaris Volume Manager Conguration on page 228 Renaming Volumes on page 232 Working with Conguration Files on page 235 Changing Solaris Volume Manager Defaults on page 237 Expanding a File System With the growfs Command on page 239 Overview of Replacing and Enabling Components in RAID 1 and RAID 5 Volumes on page 241
Task
Description
Instructions
View the Solaris Volume Use the Solaris Volume Manager GUI Manager conguration or the metastat command to view the system conguration.
How to View the Solaris Volume Manager Volume Conguration on page 229
227
Task
Description
Instructions
Rename a volume
Use the Solaris Volume Manager GUI or the metarename command to rename a volume. Use the metastat -p command and the metadb command to create conguration les. Use the metainit command to initialize Solaris Volume Manager from conguration les. Edit the /kernel/drv/md.conf le to increase the number of possible volumes.
How to Rename a Volume on page 234 How to Create Conguration Files on page 235 How to Initialize Solaris Volume Manager From a Conguration File on page 235 How to Increase the Number of Default Volumes on page 237
Create conguration les Initialize Solaris Volume Manager from conguration les Increase the number of possible volumes Increase the number of possible disk sets Grow a le system Enable components
Edit the /kernel/drv/md.conf le How to Increase the to increase the number of possible disk Number of Default Disk Sets on page 238 sets. Use the growfs command to grow a le system. Use the Solaris Volume Manager GUI or the metareplace command to enable components. Use the Solaris Volume Manager GUI or the metareplace command to replace components. How to Expand a File System on page 240 Enabling a Component on page 242 Replacing a Component With Another Available Component on page 242
Replace components
228
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node. For more information, see the online help. Use the following format of the metastat command:
metastat -p -i component-name
I
-p species to output a condensed summary, suitable for use in creating the md.tab le. -i species to verify that all devices can be accessed. component-name is the name of the volume to view. If no volume name is specied, a complete list of components will be displayed.
I I
Dbase No No No No No No
Hot Spare
Start Block 0
Dbase No
Reloc Yes
Start Block 0
Dbase No
Reloc Yes
229
d80: Soft Partition Device: d70 State: Okay Size: 2097152 blocks Extent 0 d81: Soft Partition Device: d70 State: Okay Size: 2097152 blocks Extent 0
Start Block 1
d70: Mirror Submirror 0: d71 State: Okay Submirror 1: d72 State: Okay Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 12593637 blocks d71: Submirror of d70 State: Okay Size: 12593637 blocks Stripe 0: Device c1t3d0s3 Stripe 1: Device c1t3d0s4 Stripe 2: Device c1t3d0s5
Hot Spare
Hot Spare
Hot Spare
d72: Submirror of d70 State: Okay Size: 12593637 blocks Stripe 0: Device c2t3d0s3 Stripe 1: Device c2t3d0s4 Stripe 2: Device c2t3d0s5 hsp010: is empty hsp014: 2 hot spares Device 230
Hot Spare
Hot Spare
Hot Spare
Status
Length
Reloc
c1t2d0s1 c2t2d0s1 hsp050: 2 hot spares Device c1t2d0s5 c2t2d0s5 hsp070: 2 hot spares Device c1t2d0s4 c2t2d0s4
Available Available
Yes Yes
Device Relocation Information: Device Reloc c1t2d0 Yes c2t2d0 Yes c1t1d0 Yes c2t1d0 Yes c0t0d0 Yes
Reloc Yes Yes Reloc Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
231
Device Start Block Dbase c27t8d2s0 16384 No c4t7d1s0 16384 No Stripe 3: (interlace: 32 blocks) Device Start Block Dbase c10t7d0s0 32768 No c11t5d0s0 32768 No c12t2d1s0 32768 No c14t1d0s0 32768 No c15t8d1s0 32768 No c17t3d0s0 32768 No c18t6d1s0 32768 No c19t4d1s0 32768 No c1t5d0s0 32768 No c2t6d1s0 32768 No c3t4d1s0 32768 No c5t2d1s0 32768 No c6t1d0s0 32768 No c8t3d0s0 32768 No
Reloc Yes Yes Reloc Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Renaming Volumes
Background Information for Renaming Volumes
The metarename command with the -x option can exchange the names of volumes that have a parent-child relationship. For more information, see How to Rename a Volume on page 234 and the metarename(1M) man page. Solaris Volume Manager enables you to rename most types of volumes at any time, subject to some constraints. Renaming volumes or switching volume names is an administrative convenience for management of volume names. For example, you could arrange all le system mount points in a desired numeric range. You might rename volumes to maintain a naming scheme for your logical volumes or to allow a transactional volume to use the same name as the underlying volume had been using.
232
Before you rename a volume, make sure that it is not currently in use. For a le system, make sure it is not mounted or being used as swap. Other applications using the raw device, such as a database, should have their own way of stopping access to the data. Specic considerations for renaming volumes include the following:
I
Soft partitions Volumes on which soft partitions are directly built Volumes that are being used as log devices Hot spare pools
You can rename volumes within a disk set. However, you cannot rename volumes to move them from one disk set to another.
You can use either the Enhanced Storage tool within the Solaris Management Console or the command line (the metarename(1M) command) to rename volumes.
The metarename -x command can make it easier to mirror or unmirror an existing volume, and to create or remove a transactional volume of an existing volume.
I
You cannot rename a volume that is currently in use. This includes volumes that are used as mounted le systems, as swap, or as active storage for applications or databases. Thus, before you use the metarename command, stop all access to the volume being renamed. For example, unmount a mounted le system. You cannot exchange volumes in a failed state, or volumes using a hot spare replacement. An exchange can only take place between volumes with a direct parent-child relationship. You could not, for example, directly exchange a stripe in a mirror that is a master device with the transactional volume. You must use the -f (force) ag when exchanging members of a transactional device.
Chapter 21 Maintaining Solaris Volume Manager (Tasks) 233
You cannot exchange (or rename) a logging device. The workaround is to either detach the logging device, rename it, then reattach it to the transactional device; or detach the logging device and attach another logging device of the desired name. Only volumes can be exchanged. You cannot exchange slices or hot spares.
Caution Solaris Volume Manager transactional volumes do not support large volumes. In all cases, UFS logging (see mount_ufs(1M)) provides better performance than transactional volumes, and UFS logging does support large volumes as well.
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node and select the volume you want to rename. Right-click the icon and choose the Properties option, then follow the instructions on screen. For more information, see the online help. Use the following format of the metarename command:
metarename old-volume-name new-volume-name
I I
old-volume-name is the name of the existing volume. new-volume-name is the new name for the existing volume.
See metarename(1M) for more information. 4. Edit the /etc/vfstab le to refer to the new volume name, if necessary. 5. Remount the le system.
In this example, the volume d10 is renamed to d100. Because d10 contains a mounted le system, the le system must be unmounted before the rename can occur. If the volume is used for a le system with an entry in the /etc/vfstab le, the entry must be changed to reference the new volume name. For example, the following line:
234 Solaris Volume Manager Administration Guide April 2004
Then, the le system should be remounted. If you have an existing mirror or transactional volume, you can use the metarename -x command to remove the mirror or transactional volume and keep data on the underlying volume. For a transactional volume, as long as the master device is a volume (RAID 0, RAID 1, or RAID 5 volume), you can keep data on that volume.
Manager environment, use the metastat -p command to create the /etc/lvm/md.tab le.
# metastat -p > /etc/lvm/md.tab
This le contains all parameters for use by the metainit, and metahs commands, in case you need to set up several similar environments or re-create the conguration after a system failure. For more information about the md.tab le, see Overview of the md.tab File on page 305.
235
If your system loses the information maintained in the state database (for example, because the system was rebooted after all state database replicas were deleted), and as long as no volumes were created since the state database was lost, you can use the md.cf or md.tab les to recover your Solaris Volume Manager conguration.
Note The md.cf le does not maintain information on active hot spares. Thus, if hot spares were in use when the Solaris Volume Manager conguration was lost, those volumes that were using active hot spares will likely be corrupted.
For more information about these les, see md.cf(4) and md.tab(4). 1. Create state database replicas. See Creating State Database Replicas on page 62 for more information. 2. Create, or update the /etc/lvm/md.tab le.
I
If you are attempting to recover the last known Solaris Volume Manager conguration, copy the md.cf le to the md.tab le. If you are creating a new Solaris Volume Manager conguration based on a copy of the md.tab le that you preserved, put a copy of your preserved le at /etc/lvm/md.tab.
If you are creating a new conguration or recovering a conguration after a crash, congure the mirrors as one-way mirrors. If a mirrors submirrors are not the same size, be sure to use the smallest submirror for this one-way mirror. Otherwise data could be lost. If you are recovering an existing conguration and Solaris Volume Manager was cleanly stopped, leave the mirror conguration as multi-way mirrors Specify RAID 5 volumes with the -k option, to prevent reinitialization of the device. See the metainit(1M) man page for more information.
4. Check the syntax of the md.tab le entries without committing changes by using the following form of the metainit command:
# metainit -n -a component-name
The metainit command does not maintain a hypothetical state of the devices that might have been created while running with the -n, so creating volumes that rely on other, non-existent volumes will result in errors with the -n even though the command may succeed without the -n option.
I
-n species not to actually create the devices. Use this to check to verify that the results will be as you expect -a species to activate the devices.
236
component-name species the name of the component to initialize. If no component is specied, all components will be created.
5. If no problems were apparent from the previous step, re-create the volumes and hot spare pools from the md.tab le:
# metainit -a component-name
I I
-a species to activate the devices. component-name species the name of the component to initialize. If no component is specied, all components will be created.
6. As needed, make the one-way mirrors into multi-way mirrors by using the metattach command. 7. Validate the data on the volumes.
128 volumes per disk set 4 disk sets State database replica maximum size of 8192 blocks
The values of total volumes and number of disk sets can be changed if necessary, and the tasks in this section tell you how.
237
1. Check the prerequisites (Prerequisites for Troubleshooting the System on page 278). 2. Edit the /kernel/drv/md.conf le. 3. Change the value of the nmd eld. Values up to 8192 are supported. 4. Save your changes. 5. Perform a reconguration reboot to build the volume names.
# reboot -- -r
Examplemd.conf File
Here is a sample md.conf le that is congured for 256 volumes.
# #ident "@(#)md.conf 1.7 94/04/04 SMI" # # Copyright (c) 1992, 1993, 1994 by Sun Microsystems, Inc. # # #pragma ident "@(#)md.conf 2.1 00/07/07 SMI" # # Copyright (c) 1992-1999 by Sun Microsystems, Inc. # All rights reserved. # name="md" parent="pseudo" nmd=256 md_nsets=4;
1. Check the prerequisites (Prerequisites for Troubleshooting the System on page 278). 2. Edit the /kernel/drv/md.conf le. 3. Change the value of the md_nsets eld. Values up to 32 are supported. 4. Save your changes.
238 Solaris Volume Manager Administration Guide April 2004
Examplemd.conf File
Here is a sample md.conf le that is congured for ve shared disk sets. The value of md_nsets is six, which results in ve shared disk sets and the one local disk set.
# # #pragma ident "@(#)md.conf 2.1 00/07/07 SMI" # # Copyright (c) 1992-1999 by Sun Microsystems, Inc. # All rights reserved. # name="md" parent="pseudo" nmd=128 md_nsets=6; # Begin MDD database info (do not edit) ... # End MDD database info (do not edit)
A volume, regardless if it is used for a le system, application, or database, can be expanded. So, you can expand RAID 0 (stripe and concatenation) volumes, RAID 1 (mirror) volumes, and RAID 5 volumes as well as soft partitions. You can concatenate a volume that contains an existing le system while the le system is in use. Then, as long as the le system is UFS, it can be expanded (with the growfs command) to ll the larger space without interrupting read access to the data. Once a le system is expanded, it cannot be shrunk, due to constraints in UFS. Applications and databases that use the raw device must have their own method to grow the added space so that they can recognize it. Solaris Volume Manager does not provide this capability. When a component is added to a RAID 5 volume, it becomes a concatenation to the device. The new component does not contain parity information. However, data on the new component is protected by the overall parity calculation that takes place for the volume. You can expand a log device by adding additional components. You do not need to run the growfs command, as Solaris Volume Manager automatically recognizes the additional space on reboot. Soft partitions can be expanded by adding space from the underlying volume or slice. All other volumes can be expanded by adding slices.
I I
See the following example and the growfs(1M) man page for more information.
/dev/md/dsk/d10 69047 65426 0 100% /home2 ... # growfs -M /home2 /dev/md/rdsk/d10 /dev/md/rdsk/d10: 295200 sectors in 240 cylinders of 15 tracks, 82 sectors 144.1MB in 15 cyl groups (16 c/g, 9.61MB/g, 4608 i/g) super-block backups (for fsck -F ufs -o b=#) at: 32, 19808, 39584, 59360, 79136, 98912, 118688, 138464, 158240, 178016, 197792, 217568, 237344, 257120, 276896, # df -k Filesystem kbytes used avail capacity Mounted on ... /dev/md/dsk/d10 138703 65426 59407 53% /home2 ...
In this example, a new slice was added to a volume, d10, which contains the mounted le system /home2. The growfs command species the mount point with the -M option to be /home2, which is expanded onto the raw volume /dev/md/rdsk/d10. The le system will span the entire volume when the growfs command is complete. You can use the df -hk command before and after to verify the total disk capacity. For mirror and transactional volumes, always run the growfs command on the top-level volume, not a submirror or master device, even though space is added to the submirror or master device.
241
Note When recovering from disk errors, scan /var/adm/messages to see what kind of errors occurred. If the errors are transitory and the disks themselves do not have problems, try enabling the failed components. You can also use the format command to test a disk.
Enabling a Component
You can enable a component when any of the following conditions exist:
I
Solaris Volume Manager could not access the physical drive. This problem might have occurred, for example, due to a power loss, or a loose drive cable. In this case, Solaris Volume Manager puts the components in the Maintenance state. You need to make sure that the drive is accessible (restore power, reattach cables, and so on), and then enable the components in the volumes. You suspect that a physical drive is having transitory problems that are not disk-related. You might be able to x a component in the Maintenance state by simply enabling it. If this does not x the problem, then you need to either physically replace the disk drive and enable the component, or replace the component with another available component on the system. When you physically replace a drive, be sure to partition it like the old drive to ensure adequate space on each used component.
Note Always check for state database replicas and hot spares on the drive being replaced. Any state database replica shown to be in error should be deleted before replacing the disk. Then after enabling the component, they should be re-created (at the same size). You should treat hot spares in the same manner.
A disk drive has problems, and you do not have a replacement drive, but you do have available components elsewhere on the system. You might want to use this strategy if a replacement is absolutely necessary but you do not want to shut down the system.
242
Physical disks might report soft errors even though Solaris Volume Manager shows the mirror/submirror or RAID 5 volume in the Okay state. Replacing the component in question with another available component enables you to perform preventative maintenance and potentially prevent hard errors from occurring.
I
You want to do performance tuning. For example, by using the performance monitoring feature available from the Enhanced Storage tool within the Solaris Management Console, you see that a particular component in a RAID 5 volume is experiencing a high load average, even though it is in the Okay state. To balance the load on the volume, you can replace that component with a component from a disk that is less utilized. You can perform this type of replacement online without interrupting service to the volume.
the RAID 5 volume is a read-only device. You need to perform some type of error recovery so that the state of the RAID 5 volume is stable and the possibility of data loss is reduced. If a RAID 5 volume reaches a Last Erred state, there is a good chance it has lost data. Be sure to validate the data on the RAID 5 volume after you repair it.
Background Information For Replacing and Enabling Slices in RAID 1 and RAID 5 Volumes
When you replace components in a mirror or a RAID 5 volume, follow these guidelines:
I
Always replace components in the Maintenance state rst, followed by those components in the Last Erred state. After a component is replaced and resynchronized, use the metastat command to verify the volumes state, then validate the data to make sure it is good. Replacing or enabling a component in the Last Erred state usually means that some data has been lost. Be sure to validate the data on the volume after you repair it. For a UFS, run the fsck command to validate the metadata (the structure of the le system) then check the actual user data. (Practically, users will have to examine their les.) A database or other application must have its own way of validating its internal data structure. Always check for state database replicas and hot spares when you replace components. Any state database replica shown to be in error should be deleted before you replace the physical disk. The state database replica should be added back before enabling the component. The same procedure applies to hot spares. RAID 5 volumes During component replacement, data is recovered, either from a hot spare currently in use, or using the RAID level 5 parity, when no hot spare is in use. RAID 1 volumes When you replace a component, Solaris Volume Manager automatically starts resynchronizing the new component with the rest of the mirror. When the resynchronization completes, the replaced component becomes readable and writable. If the failed component has been replaced with data from a hot spare, the hot spare is placed in the Available state and made available for other hot spare replacements. The new component must be large enough to replace the old component. As a precaution, back up all data before you replace Last Erred devices.
I I
Note A submirror or RAID 5 volume might be using a hot spare in place of a failed component. When that failed component is enabled or replaced by using the procedures in this section, the hot spare is marked Available in the hot spare pool, and is ready for use.
244
CHAPTER
22
Deploying Small Servers on page 245 Using Solaris Volume Manager With Networked Storage Devices on page 247
As a starting point, consider a Netra with a single SCSI bus and two internal disksan off-the-shelf conguration, and a good starting point for distributed servers. Solaris Volume Manager could easily be used to mirror some or all of the slices, thus providing redundant storage to help guard against disk failure. See the following gure for an example.
245
SCSI controller
c0t0d0
c0t1d0
FIGURE 221
A conguration like this example might include mirrors for the root (/), /usr, swap, /var, and /export le systems, plus state database replicas (one per disk). As such, a failure of either side of any of the mirrors would not necessarily result in system failure, and up to ve discrete failures could possibly be tolerated. However, the system is not sufficiently protected against disk or slice failure. A variety of potential failures could result in a complete system failure, requiring operator intervention. While this conguration does help provide some protection against catastrophic disk failure, it exposes key possible single points of failure:
I
The single SCSI controller represents a potential point of failure. If the controller fails, the system will be down, pending replacement of the part. The two disks do not provide adequate distribution of state database replicas. The majority consensus algorithm requires that half of the state database replicas be available for the system to continue to run, and half plus one replica for a reboot. So, if one state database replica were on each disk and one disk or the slice containing the replica failed, the system could not reboot (thus making a mirrored root ineffective). If two or more state database replicas were on each disk, a single slice failure would likely not be problematic, but a disk failure would still prevent a reboot. If different number of replicas were on each disk, one would have more than half and one fewer than half. If the disk with fewer replicas failed, the system could reboot and continue. However, if the disk with more replicas failed, the system would immediately panic.
A Best Practices approach would be to modify the conguration by adding one more controller and one more hard drive. The resulting conguration could be far more resilient.
246
Solaris Volume Manager RAID 1 volumes built on underlying hardware storage devices are not RAID 1+0, as Solaris Volume Manager cannot understand the underlying storage well enough to offer RAID 1+0 capabilities. Conguring soft partitions on top of an Solaris Volume Manager RAID 1 volume, built in turn on a hardware RAID 5 device is a very exible and resilient conguration.
247
248
CHAPTER
23
Top Down Volume Creation (Task Map) on page 249 Overview Of Top Down Volume Creation on page 250 Before You Begin on page 252 Understanding Which Disks Are Available on page 253 Creating Volumes Automatically on page 253 Analyzing Volume Creation with the metassist Command on page 255 Changing Default Behavior of the metassist Command on page 260
Task
Description
Allows you to use the metassist command to create one or more Solaris Volume Manager volumes.
249
Task
Description
Specifying Output Verbosity from the metassist Command on page 255 Creating a Command File with the metassist Command on page 257 Creating a Volume with A Saved Shell Script Created by the metassist Command on page 258 Creating a Volume Conguration File with the metassist Command on page 259 Changing the Volume Defaults File on page 260
Allows you to control the amount of information about the volume creation process that the metassist command provides for troubleshooting or diagnosis. Helps you create a shell script with the metassist command to generate the volumes that the command specied. Shows you how to create the Solaris Volume Manager volumes that the metassist command specied with the shell script previously generated by the command. Helps you create a volume conguration le, describing the characteristics of the volumes you want to create. Allows you to set default volume characteristics to customize the metassist commands behavior.
size redundancy (number of copies of data) data paths fault recovery (whether the volume should be associated with a hot spare pool)
250
For cases in which its important to more specicly dene the volume characteristics (or constraints under which the volumes should be created), you can also specify
I I I I I
volume types (for example, RAID 0 (concatenation) or RAID 0 (stripe)) components to use in specic volumes components that are available or unavailable for use number of components to use details specic to the type of volume being created (including interlace value for stripes, read policy for mirrors, and similar characteristics)
Additionally, the system administrator can constrain the command to use (or not use) specic disks or paths.
251
Command line metassist Processing Complete volumes Volume request Volume specification Command file
FIGURE 231 The metassist command supports end-to-end processing, based on command line or les, or partial processing to allow the system administrator to provide le-based data or check volume characteristics.
For an automatic, hands-off approach to volume creation, use the command line to specify the quality of service attributes you require, and allow the metassist command to create the necessary volumes for you. This could be as simple as:
# metassist create -s storagepool -S 10Gb
This command would create a stripe volume of 10Gb in size in the storagepool disk set, using available storage existing in the storagepool disk set. Alternatively, you can use a volume request le to dene characteristics of a volume, then use the metassist command to implement it. As shown in Figure 231, a volume specication le can be produced, so the system administrator can assess the intended implementation or edit it if needed. This volume specication le can then be used as input to the metassist command to create volumes. The command le shown in Figure 231 is a shell script that implements the Solaris Volume Manager device conguration that the metassist command species. A system administrator can use that le for repeated creation or to edit as appropriate, or can skip that step completely and create the volumes directly.
252
root access or have assumed an equivalent role. See Becoming Superuser (root) or Assuming a Role in System Administration Guide: Basic Administration for more information. state database replicas, distributed appropriately for your system. See About the Solaris Volume Manager State Database and Replicas on page 53 for more information about state database replicas. available disks to use for the volumes you will create. The metassist command uses disk sets to help manage storage, so complete disks (or an existing disk set) must be available to create new volumes with the metassist command.
Disks used in other disk sets Mounted slices Slices with a le system superblock, indicating a mountable le system Slices used in other Solaris Volume Manager volumes
Any slices that meet these criteria are unavailable for use by the metassist command.
command
1. Make sure that you have the necessary prerequisites for using top down volume creation (the metassist command). 2. Identify available storage on which to create the mirror. If you do not explicitly specify any storage, Solaris Volume Manager will identify unused storage on the system and use it as appropriate. If you choose to specify storage, either broadly (for example, all storage on controller 1) or specically (for example, use c1t4d2, but do not use c1t4d1), Solaris Volume Manager will use the storage you specify as needed. 3. Use the following form of the metassist command to create a twoway mirror: metassist create -s diskset-name [-r redundancy] -S size
I I I
create is the subcommand to create volumes. -s diskset-name species the name of the disk set to use for the volumes. -r redundancy species the level of redundancy (number of data copies) to create. -S size species the size of the volume to create in KB, MB, GB, or TB, for kilobytes, megabytes, gigabytes, and terabytes, respectively.
See the following examples and the metassist(1M) man page for more information. 4. Use the metastat command to view the new volumes (two striped submirrors and one mirror). metastat -s diskset-name
This example shows how to create a two-way mirror, 10Mb in size, with the metassist command. The metassist command identies unused disks and creates the best mirror possible using those disks. The -s myset argument species that the volumes will be created in the myset disk set, which will be created if necessary.
ExampleCreating a Two-Way Mirror and Hot Spare with the metassist command
# metassist create -s myset -f -r 2 -S 10mb
254
This example shows how to use the metassist command to create a two-way mirror, 10Mb in size, with a hot spare to provide additional fault tolerance (the -f option species fault tolerance).
This example shows how to use the metassist command to create a stripe using disks available on controller 1 (the -a option species the available controller).
Why a volume was created in a certain way Why a volume was not created What volumes the metassist command would create, without actually creating the volumes
3. Use the following form of the metassist command to create a stripe and specify verbose output: metassist create -s diskset-name -S size [-v verbosity]
I I I
create is the subcommand to create volumes. -s diskset-name species the name of the disk set to use for the volumes. -S size species the size of the volume to create in KB, MB, GB, or TB, for kilobytes, megabytes, gigabytes, and terabytes, respectively. -v verbosity species how verbose the output should be. The default level is 1, and allowable values range from 0 (nearly silent output) to 2 (signicant output).
See the following examples and the metassist(1M) man page for more information. 4. Use the metastat command to view the new volume. metastat -s diskset-name
This example shows how to use the metassist command to create a two-way mirror, 10Mb in size, with a hot spare to provide additional fault tolerance (the -f option species fault tolerance). The nal argument (-v) species a verbosity level of two, which is the maximum level and will provide the most information possible about how the metassist command worked.
This example shows how to use the metassist command to create a two-way mirror, 10Mb in size, with a hot spare to provide additional fault tolerance (the -f option species fault tolerance). The nal argument (-v 0) species a verbosity level of zero, which is the minimum level and will provide nearly silent output when the command runs.
256
Command
1. Make sure that you have the necessary prerequisites for using top down volume creation (the metassist command). 2. Identify available storage on which to create the volume. 3. Use the following form of the metassist command to create a stripe and specify that the volume should not actually be created, but that a command sequence (shell script) to create the volumes should be sent to standard output: metassist create -s diskset-name -S size [-c]
I I I
create is the subcommand to create volumes. -s diskset-name species the name of the disk set to use for the volumes. -S size species the size of the volume to create in KB, MB, GB, or TB, for kilobytes, megabytes, gigabytes, and terabytes, respectively. -c species that the volume should not actually be created. Instead, a shell script that can be used to create the specied conguration will be sent to standard output.
See the following examples and the metassist(1M) man page for more information. Note that the shell script required by the -c argument will be sent to standard output, while the rest of the output from the metassist command goes to standard error, so you can redirect the output streams as you choose.
This example shows how to use the metassist command to create a two-way mirror, 10Mb in size, with a hot spare to provide additional fault tolerance (the -f option species fault tolerance). The nal argument (-c) species that the volume should not actually be created, but rather than a shell script that could be used to create the specied conguration should be sent to standard output.
Chapter 23 Automatic (Top Down) Volume Creation (Tasks) 257
This example shows how to use the metassist command to create a two-way mirror, 10Mb in size, with a hot spare to provide additional fault tolerance (the -f option species fault tolerance). The nal argument (-c) species that the volume should not actually be created, but rather that a shell script that could be used to create the specied conguration should be sent to standard output. The end of the command redirects standard output to create the /tmp/metassist-shell-script.sh shell script that can later be used to create the specied volume.
Creating a Volume with A Saved Shell Script Created by the metassist Command
After you have created a shell script with the metassist command, you can use that script to create the volumes specied when the shell script was created.
Note The command script created by the metassist command has signicant dependencies on the specic system conguration of the system on which the script was created, at the time the script was created. Using the script on different systems or after any changes to the system conguration can lead to data corruption or loss.
258
metassist Command
1. Make sure that you have the necessary prerequisites for using top down volume creation (the metassist command). 2. Identify available storage on which to create the volume. 3. Use the following form of the metassist command to create a stripe and specify that the volume should not actually be created, but that a volume conguration le describing the proposed volumes should be sent to standard output: metassist create -s diskset-name -S size [-d]
I I I
create is the subcommand to create volumes. -s diskset-name species the name of the disk set to use for the volumes. -S size species the size of the volume to create in KB, MB, GB, or TB, for kilobytes, megabytes, gigabytes, and terabytes, respectively. -d species that the volume should not actually be created. Instead, an XML-based volume conguration le that can eventually be used to create the specied conguration will be sent to standard output.
See the following examples and the metassist(1M) man page for more information. Note that the XML-based volume conguration le required by the -d argument will be sent to standard output, while the rest of the output from the metassist command goes to standard error, so you can redirect the output streams as you choose.
259
This example shows how to use the metassist command to create a two-way mirror, 10Mb in size, with a hot spare to provide additional fault tolerance (the -f option species fault tolerance). The nal argument (-d) species that the volume should not actually be created, but rather that a volume conguration le that could eventually be used to create the specied conguration should be sent to standard output.
This example shows how to use the metassist command to create a two-way mirror, 10Mb in size, with a hot spare to provide additional fault tolerance (the -f option species fault tolerance). The nal argument (-d) species that the volume should not actually be created, but rather that a volume conguration le that could eventually be used to create the specied conguration should be sent to standard output. The end of the command redirects standard output to create the /tmp/metassist-volumeconfig.xml shell script that can later be used to create the specied volume.
260
Note When you edit the le, you must ensure that the le continues to be compliant with the /usr/share/lib/xml/dtd/volume-defaults.dtd Document Type Denition. If the XML le is not compliant with the DTD, the metassist command will fail with an error message.
This example shows how to use the metassist command to create a 10GB stripe, using exactly four slices and an interlace value of 512KB, as specied in the /etc/default/metassist.xml le.
261
262
CHAPTER
24
I I I I
Solaris Volume Manager Monitoring and Reporting (Task Map) on page 264 Conguring the mdmonitord Command for Periodic Error Checking on page 264 Solaris Volume Manager SNMP Agent Overview on page 265 Conguring the Solaris Volume Manager SNMP Agent on page 266 Limitations of the Solaris Volume Manager SNMP Agent on page 268 Monitoring Solaris Volume Manager With a cron Job on page 269
263
Task
Description
Instructions
Set the mdmonitord daemon to periodically check for errors Congure the Solaris Volume Manager SNMP agent Monitor Solaris Volume Manager with scripts run by the cron command
Set the error-checking interval used by Conguring the the mdmonitord daemon by editing mdmonitord Command the /etc/rc2.d/S95svm.sync le. for Periodic Error Checking on page 264 Edit the conguration les in the Conguring the Solaris /etc/snmp/conf directory so Solaris Volume Manager SNMP Volume Manager will throw traps Agent on page 266 appropriately, to the correct system. Create or adapt a script to check for errors, then run the script from the cron command. Monitoring Solaris Volume Manager With a cron Job on page 269
264
1. Become superuser. 2. Edit the /etc/rc2.d/S95svm.sync script and change the line that starts the mdmonitord command by adding a - t ag and the number of seconds between checks.
if [ -x $MDMONITORD ]; then $MDMONITORD -t 3600 error=$? case $error in 0) ;; *) echo "Could not start $MDMONITORD. Error $error." ;; esac fi
These packages are part of the Solaris operating environment and are normally installed by default unless the package selection was modied at install time or a minimal set of packages was installed. After you conrm that all ve packages are available (by using the pkginfo pkgname command, as in pkginfo SUNWsasnm), you need to congure the Solaris Volume Manager SNMP agent, as described in the following section.
265
3. Edit the /etc/snmp/conf/mdlogd.acl le to specify which hosts should receive SNMP traps. Look in the le for the following:
trap = { { trap-community = SNMP-trap hosts = corsair { enterprise = "Solaris Volume Manager" trap-num = 1, 2, 3 }
Change the line that containshosts = corsair to specify the host name that you want to receive Solaris Volume Manager SNMP traps. For example, to send SNMP traps to lexicon, you would edit the line to hosts = lexicon. If you want to include multiple hosts, provide a comma-delimited list of host names, as in hosts = lexicon, idiom.
266 Solaris Volume Manager Administration Guide April 2004
4. Also edit the /etc/snmp/conf/snmpdx.acl le to specify which hosts should receive the SNMP traps. Find the block that begins with trap = and add the same list of hosts that you added in the previous step. This section might be commented out with #s. If so, you must remove the # at the beginning of the required lines in this section. Additional lines in the trap section are also commented out, but you can leave those lines alone or delete them for clarity. After uncommenting the required lines and updating the hosts line, this section could look like this:
################### # trap parameters # ################### trap = { { trap-community = SNMP-trap hosts =lexicon { enterprise = "sun" trap-num = 0, 1, 2-5, 6-16 } # { # enterprise = "3Com" # trap-num = 4 # } # { # enterprise = "snmp" # trap-num = 0, 2, 5 # } # } # { # trap-community = jerry-trap # hosts = jerry, nanak, hubble # { # enterprise = "sun" # trap-num = 1, 3 # } # { # enterprise = "snmp" # trap-num = 1-3 # } } }
Note Make sure that you have the same number of opening and closing brackets in the /etc/snmp/conf/snmpdx.acl le.
5. Add a new Solaris Volume Manager section to the /etc/snmp/conf/snmpdx.acl le, inside the section you that uncommented in the previous step.
Chapter 24 Monitoring and Error Reporting (Tasks) 267
trap-community = SNMP-trap hosts = lexicon { enterprise = "sun" trap-num = 0, 1, 2-5, 6-16 } { enterprise = "Solaris Volume Manager" trap-num = 1, 2, 3 }
Note that the added four lines are placed immediately after the enterprise = sun block. 6. Append the following line to the /etc/snmp/conf/enterprises.oid le:
"Solaris Volume Manager" "1.3.6.1.4.1.42.104"
A RAID 1 or RAID 5 subcomponent goes into needs maintenance state A hot spare is swapped into service A hot spare starts to resynchronize A hot spare completes resynchronization A mirror is taken offline A disk set is taken by another host and the current host panics
Many problematic situations, such as an unavailable disk with RAID 0 volumes or soft partitions on it, do not result in SNMP traps, even when reads and writes to the device are attempted. SCSI or IDE errors are generally reported in these cases, but other SNMP agents must issue traps for those errors to be reported to a monitoring console.
268
create a script that the cron utility can periodically. The following example shows a script that you can adapt and modify for your needs.
Note This script serves as a starting point for automating Solaris Volume Manager error checking. You will probably need to modify this script for your own conguration.
# #ident "@(#)metacheck.sh 1.3 96/06/21 SMI" #!/bin/ksh #!/bin/ksh -x #!/bin/ksh -v # ident=%Z%%M% %I% %E% SMI # # Copyright (c) 1999 by Sun Microsystems, Inc. # # metacheck # # Check on the status of the metadevice configuration. If there is a problem # return a non zero exit code. Depending on options, send email notification. # # -h # help # -s setname # Specify the set to check. By default, the local set will be checked. # -m recipient [recipient...] # Send email notification to the specified recipients. This # must be the last argument. The notification shows up as a short # email message with a subject of # "Solaris Volume Manager Problem: metacheck.who.nodename.setname" # which summarizes the problem(s) and tells how to obtain detailed # information. The "setname" is from the -s option, "who" is from # the -w option, and "nodename" is reported by uname(1). # Email notification is further affected by the following options: # -f to suppress additional messages after a problem # has been found. # -d to control the supression. Chapter 24 Monitoring and Error Reporting (Tasks) 269
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
-w -t
to identify who generated the email. to force email even when there is no problem.
-w who indicate who is running the command. By default, this is the user-name as reported by id(1M). This is used when sending email notification (-m). -f Enable filtering. Filtering applies to email notification (-m). Filtering requires root permission. When sending email notification the file /etc/lvm/metacheck.setname.pending is used to controll the filter. The following matrix specifies the behavior of the filter: problem_found yes yes file_exists no Create file, send notification yes Resend notification if the current date (as specified by -d datefmt) is different than the file date. yes Delete file, send notification that the problem is resolved. no Send notification if -t specified.
no no
-d datefmt Specify the format of the date for filtering (-f). This option controls the how often re-notification via email occurs. If the current date according to the specified format (strftime(3C)) is identical to the date contained in the /etc/lvm/metacheck.setname.pending file then the message is suppressed. The default date format is "%D", which will send one re-notification per day. -t Test mode. Enable email generation even when there is no problem. Used for end-to-end verification of the mechanism and email addresses.
These options are designed to allow integration of metacheck into crontab. For example, a root crontab entry of: 0,15,30,45 * * * * /usr/sbin/metacheck -f -w SVMcron \ -d \%D \%h -m [email protected] [email protected] would check for problems every 15 minutes, and generate an email to [email protected] (and send to an email pager service) every hour when there is a problem. Note the \ prior to the % characters for a crontab entry. Bounced email would come back to root@nodename. The subject line for email generated by the above line would be Solaris Volume Manager Problem: metacheck.SVMcron.nodename.local
# display a debug line to controlling terminal (works in pipes) decho() { if [ "$debug" = "yes" ] ; then echo "DEBUG: $*" < /dev/null > /dev/tty 2>&1 fi 270 Solaris Volume Manager Administration Guide April 2004
} # if string $1 is in $2-* then return $1, else return "" strstr() { typeset look="$1" typeset ret="" shift decho "strstr LOOK .$look. FIRST .$1." while [ $# -ne 0 ] ; do if [ "$look" = "$1" ] ; then ret="$look" fi shift done echo "$ret"
} # if string $1 is in $2-* then delete it. return result strdstr() { typeset look="$1" typeset ret="" shift decho "strdstr LOOK .$look. FIRST .$1." while [ $# -ne 0 ] ; do if [ "$look" != "$1" ] ; then ret="$ret $1" fi shift done echo "$ret"
} merge_continued_lines() { awk -e \ BEGIN { line = "";} \ $NF == "\\" { \ $NF = ""; \ line = line $0; \ next; \ } \ $NF != "\\" { \ if ( line != "" ) { \ print line $0; \ line = ""; \ } else { \ print $0; \ } \ } }
271
# trim out stuff not associated with metadevices find_meta_devices() { typeset devices="" # decho "find_meta_devices .$*." while [ $# -ne 0 ] ; do case $1 in d+([0-9]) ) # metadevice name devices="$devices $1" ;; esac shift done echo "$devices"
} # return the list of top level metadevices toplevel() { typeset comp_meta_devices="" typeset top_meta_devices="" typeset devices="" typeset device="" typeset comp="" metastat$setarg -p | merge_continued_lines | while read line ; do echo "$line" devices=find_meta_devices $line set -- $devices if [ $# -ne 0 ] ; then device=$1 shift # check to see if device already refered to as component comp=strstr $device $comp_meta_devices if [ -z $comp ] ; then top_meta_devices="$top_meta_devices $device" fi # add components to component list, remove from top list while [ $# -ne 0 ] ; do comp=$1 comp_meta_devices="$comp_meta_devices $comp" top_meta_devices=strdstr $comp $top_meta_devices shift done fi done > /dev/null 2>&1 echo $top_meta_devices } # # - MAIN # METAPATH=/usr/sbin PATH=//usr/bin:$METAPATH 272 Solaris Volume Manager Administration Guide April 2004
USAGE="usage: metacheck [-s setname] [-h] [[-t] [-f [-d datefmt]] \ [-w who] -m recipient [recipient...]]" datefmt="%D" debug="no" filter="no" mflag="no" set="local" setarg="" testarg="no" who=id | sed -e s/^uid=[0-9][0-9]*(// -e s/).*// while getopts d:Dfms:tw: flag do case $flag in d) datefmt=$OPTARG; ;; D) debug="yes" ;; f) filter="yes" ;; m) mflag="yes" ;; s) set=$OPTARG; if [ "$set" != "local" ] ; then setarg=" -s $set"; fi ;; t) testarg="yes"; ;; w) who=$OPTARG; ;; \?) echo $USAGE exit 1 ;; esac done # if mflag specified then everything else part of recipient shift expr $OPTIND - 1 if [ $mflag = "no" ] ; then if [ $# -ne 0 ] ; then echo $USAGE exit 1 fi else if [ $# -eq 0 ] ; then echo $USAGE exit 1 fi fi recipients="$*" curdate_filter=date +$datefmt curdate=date Chapter 24 Monitoring and Error Reporting (Tasks) 273
node=uname -n # establish files msg_f=/tmp/metacheck.msg.$$ msgs_f=/tmp/metacheck.msgs.$$ metastat_f=/tmp/metacheck.metastat.$$ metadb_f=/tmp/metacheck.metadb.$$ metahs_f=/tmp/metacheck.metahs.$$ pending_f=/etc/lvm/metacheck.$set.pending files="$metastat_f $metadb_f $metahs_f $msg_f $msgs_f" rm -f $files > /dev/null 2>&1 trap "rm -f $files > /dev/null 2>&1; exit 1" 1 2 3 15 # Check to see if metadb is capable of running have_metadb="yes" metadb$setarg > $metadb_f 2>&1 if [ $? -ne 0 ] ; then have_metadb="no" fi grep "there are no existing databases" < $metadb_f > /dev/null 2>&1 if [ $? -eq 0 ] ; then have_metadb="no" fi grep "/dev/md/admin" < $metadb_f > /dev/null 2>&1 if [ $? -eq 0 ] ; then have_metadb="no" fi # check for problems accessing metadbs retval=0 if [ "$have_metadb" = "no" ] ; then retval=1 echo "metacheck: metadb problem, cant run $METAPATH/metadb$setarg" \ >> $msgs_f else # snapshot the state metadb$setarg 2>&1 | sed -e 1d | merge_continued_lines > $metadb_f metastat$setarg 2>&1 | merge_continued_lines > $metastat_f metahs$setarg -i 2>&1 | merge_continued_lines > $metahs_f # # Check replicas for problems, capital letters in the flags # indicate an error, fields are seperated by tabs. # problem=awk < $metadb_f -F\t {if ($1 ~ /[A-Z]/) print $1;} if [ -n "$problem" ] ; then retval=expr $retval + 64 echo "\ metacheck: metadb problem, for more detail run:\n\t$METAPATH/metadb$setarg -i" \ >> $msgs_f fi # # Check the metadevice state 274 Solaris Volume Manager Administration Guide April 2004
# problem=awk < $metastat_f -e \ /State:/ {if ($2 != "Okay" && $2 != "Resyncing") print $0;} if [ -n "$problem" ] ; then retval=expr $retval + 128 echo "\ metacheck: metadevice problem, for more detail run:" \ >> $msgs_f # refine the message to toplevel metadevices that have a problem top=toplevel set -- $top while [ $# -ne 0 ] ; do device=$1 problem=metastat $device | awk -e \ /State:/ {if ($2 != "Okay" && $2 != "Resyncing") print $0;} if [ -n "$problem" ] ; then echo "\t$METAPATH/metastat$setarg $device" >> $msgs_f # find out what is mounted on the device mp=mount|awk -e /\/dev\/md\/dsk\/$device[ \t]/{print $1;} if [ -n "$mp" ] ; then echo "\t\t$mp mounted on $device" >> $msgs_f fi fi shift done fi # # Check the hotspares to see if any have been used. # problem="" grep "no hotspare pools found" < $metahs_f > /dev/null 2>&1 if [ $? -ne 0 ] ; then problem=awk < $metahs_f -e \ /blocks/ { if ( $2 != "Available" ) print $0;} fi if [ -n "$problem" ] ; then retval=expr $retval + 256 echo "\ metacheck: hot spare in use, for more detail run:\n\t$METAPATH/metahs$setarg -i" \ >> $msgs_f fi fi # If any errors occurred, then mail the report if [ $retval -ne 0 ] ; then if [ -n "$recipients" ] ; then re="" if [ -f $pending_f ] && [ "$filter" = "yes" ] ; then re="Re: " # we have a pending notification, check date to see if we resend penddate_filter=cat $pending_f | head -1 if [ "$curdate_filter" != "$penddate_filter" ] ; then rm -f $pending_f > /dev/null 2>&1 Chapter 24 Monitoring and Error Reporting (Tasks) 275
else if [ "$debug" = "yes" ] ; then echo "metacheck: email problem notification still pending" cat $pending_f fi fi fi if [ ! -f $pending_f ] ; then if [ "$filter" = "yes" ] ; then echo "$curdate_filter\n\tDate:$curdate\n\tTo:$recipients" \ > $pending_f fi echo "\ Solaris Volume Manager: $node: metacheck$setarg: Report: $curdate" >> $msg_f echo "\ --------------------------------------------------------------" >> $msg_f cat $msg_f $msgs_f | mailx -s \ "${re}Solaris Volume Manager Problem: metacheck.$who.$set.$node" $recipients fi else cat $msgs_f fi else # no problems detected, if [ -n "$recipients" ] ; then # default is to not send any mail, or print anything. echo "\ Solaris Volume Manager: $node: metacheck$setarg: Report: $curdate" >> $msg_f echo "\ --------------------------------------------------------------" >> $msg_f if [ -f $pending_f ] && [ "$filter" = "yes" ] ; then # pending filter exista, remove it and send OK rm -f $pending_f > /dev/null 2>&1 echo "Problem resolved" >> $msg_f cat $msg_f | mailx -s \ "Re: Solaris Volume Manager Problem: metacheck.$who.$node.$set" $recipients elif [ "$testarg" = "yes" ] ; then # for testing, send mail every time even thought there is no problem echo "Messaging test, no problems detected" >> $msg_f cat $msg_f | mailx -s \ "Solaris Volume Manager Problem: metacheck.$who.$node.$set" $recipients fi else echo "metacheck: Okay" fi fi rm -f $files exit $retval > /dev/null 2>&1
For information on invoking scripts by using the cron utility, see the cron(1M) man page.
276
CHAPTER
25
Troubleshooting Solaris Volume Manager (Task Map) on page 277 Overview of Troubleshooting the System on page 278 Recovering from Disk Movement Problems on page 283 Recovering From Boot Problems on page 284
This chapter describes some Solaris Volume Manager problems and their appropriate solution. It is not intended to be all-inclusive but rather to present common scenarios and recovery procedures.
Task
Description
Instructions
Replace a disk, then update state database replicas and logical volumes on the new disk.
277
Task
Description
Instructions
Recover from disk movement problems Recover from improper /etc/vfstab entries Recover from a boot device failure
Use the fsck command on the mirror, How to Recover From then edit the /etc/vfstab le so the Improper /etc/vfstab system will boot correctly. Entries on page 285 Boot from a different submirror. How to Recover From a Boot Device Failure on page 287 How to Recover From Insufficient State Database Replicas on page 291 How to Recover Conguration Data for a Soft Partition on page 295
Recover from insufficient Delete unavailable replicas by using state database replicas the metadb command. Recover conguration data for a lost soft partition Recover a Solaris Volume Manager conguration from salvaged disks Use the metarecover command to recover conguration data for soft partitions.
Attach disks to a new system and have How to Recover a Solaris Volume Manager rebuild the Conguration on page 297 conguration from the existing state database replicas.
278
I I I I I
Output from the metastat command. Output from the metastat -p command. Backup copy of the /etc/vfstab le. Backup copy of the /etc/lvm/mddb.cf le. Disk partition information, from the prtvtoc command (SPARC systems) or the fdisk command (x86based systems) Solaris version Solaris patches installed Solaris Volume Manager patches installed
I I I
Tip Any time you update your Solaris Volume Manager conguration, or make other storage or operating environment-related changes to your system, generate fresh copies of this conguration information. You could also generate this information automatically with a cron job.
Replacing Disks
This section describes how to replace disks in a Solaris Volume Manager environment.
279
Caution If you have soft partitions on a failed disk or on volumes built on a failed disk, you must put the new disk in the same physical location, with the same c*t*d* number as the disk it replaces.
The output shows three state database replicas on slice 4 of the local disks, c0t0d0 and c0t1d0. The W in the ags eld of the c0t1d0s4 slice indicates that the device has write errors. Three replicas on the c0t0d0s4 slice are still good. 3. Record the slice name where the state database replicas reside and the number of state database replicas, then delete the state database replicas. The number of state database replicas is obtained by counting the number of appearances of a slice in the metadb command output. In this example, the three state database replicas that exist on c0t1d0s4 are deleted.
# metadb -d c0t1d0s4
Caution If, after deleting the bad state database replicas, you are left with three or fewer, add more state database replicas before continuing. This will help ensure that conguration information remains intact.
4. Locate and delete any hot spares on the failed disk. Use the metastat command to nd hot spares. In this example, hot spare pool hsp000 included c0t1d0s6, which is then deleted from the pool.
280
5. Physically replace the failed disk. 6. Logically replace the failed disk using the devfsadm command, cfgadm command, luxadm command, or other commands as appropriate for your hardware and environment. 7. Update the Solaris Volume Manager state database with the device ID for the new disk using the metadevadm -u cntndn command. In this example, the new disk is c0t1d0.
# metadevadm -u c0t1d0
8. Repartition the new disk. Use the format command or the fmthard command to partition the disk with the same slice information as the failed disk. If you have the prtvtoc output from the failed disk, you can format the replacement disk with fmthard -s /tmp/failed-disk-prtvtoc-output 9. If you deleted state database replicas, add the same number back to the appropriate slice. In this example, /dev/dsk/c0t1d0s4 is used.
# metadb -a -c 3 c0t1d0s4
10. If any slices on the disk are components of RAID 5 volumes or are components of RAID 0 volumes that are in turn submirrors of RAID 1 volumes, run the metareplace -e command for each slice. In this example, /dev/dsk/c0t1d0s4 and mirror d10 are used.
# metareplace -e d10 c0t1d0s4
11. If any soft partitions are built directly on slices on the replaced disk, run the metarecover -d -p command on each slice containing soft partitions to regenerate the extent headers on disk. In this example, /dev/dsk/c0t1d0s4 needs to have the soft partition markings on disk regenerated, so is scanned and the markings are reapplied, based on the information in the state database replicas.
# metarecover c0t1d0s4 -d -p
12. If any soft partitions on the disk are components of RAID 5 volumes or are components of RAID 0 volumes that are submirrors of RAID 1 volumes, run the metareplace -e command for each slice. In this example, /dev/dsk/c0t1d0s4 and mirror d10 are used.
# metareplace -e d10 c0t1d0s4
281
13. If any RAID 0 volumes have soft partitions built on them, run the metarecover command for each of the RAID 0 volume. In this example, RAID 0 volume d17 has soft partitions built on it.
# metarecover d17 -m -p
14. Replace hot spares that were deleted, and add them to the appropriate hot spare pool or pools.
# metahs -a hsp000 c0t0d0s6 hsp000: Hotspare is added
15. If soft partitions or non-redundant volumes were affected by the failure, restore data from backups. If only redundant volumes were affected, then validate your data. Check the user/application data on all volumes. You might have to run an application-level consistency checker or use some other method to check the data.
root succeeded for root on /dev/console SunOS 5.9 s81_39 May 2002 first blk 16 8208 16400 block count 8192 8192 8192
282
No data loss has occured, and none will occur as a direct result of this problem. This error message indicates that the Solaris Volume Manager name records have been only partially updated, so output from the metastat command will likely show some of the c*t*d* names previously used, and some of the c*t*d* names reecting the state after the move. If you need to update your Solaris Volume Manager conguration while this condition exists, you must use the c*t*d* names reported by the metastat command when you issue any meta* commands. If this error condition occurs, you can do one of the following to resolve the condition:
I
Restore all disks to their original locations. Next, do a reconguration reboot, or run (as a single command):
Chapter 25 Troubleshooting Solaris Volume Manager (Tasks) 283
After these commands complete, the error condition will be resolved and you can continue.
I
The /etc/vfstab le contains incorrect information. There are not enough state database replicas. A boot device (disk) has failed. The boot mirror has failed.
How to Recover From Improper /etc/vfstab Entries on page 285 How to Recover From Insufficient State Database Replicas on page 291 How to Recover From a Boot Device Failure on page 287
If Solaris Volume Manager takes a volume offline due to errors, unmount all le systems on the disk where the failure occurred. Because each disk slice is independent, multiple le systems can be mounted on a single disk. If the software has encountered a failure, other slices on the same disk will likely experience failures soon. File systems mounted directly on disk slices do not have the protection of Solaris Volume Manager error handling, and leaving such le systems mounted can leave you vulnerable to crashing the system and losing data.
284
Minimize the amount of time you run with submirrors disabled or offline. During resynchronization and online backup intervals, the full protection of mirroring is gone.
Because of the errors, you automatically go into single-user mode when the system is booted:
ok boot ... configuring network interfaces: hme0. Hostname: lexicon mount: /dev/dsk/c0t3d0s0 is not this fstype. setmnt: Cannot open /etc/mnttab for writing INIT: Cannot create /var/adm/utmp or /var/adm/utmpx INIT: failed write of utmpx entry:" INIT: failed write of utmpx entry:" INIT: SINGLE USER MODE Type Ctrl-d to proceed with normal startup, (or give root password for system maintenance): <root-password> " "
At this point, root (/) and /usr are mounted read-only. Follow these steps: 1. Run the fsck command on the root (/) mirror.
Note Be careful to use the correct volume for root.
# fsck /dev/md/rdsk/d0 ** /dev/md/rdsk/d0 ** Currently Mounted on / ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 2274 files, 11815 used, 10302 free (158 frags, 1268 blocks, 0.7% fragmentation)
2. Remount root (/) read/write so you can edit the /etc/vfstab le.
# mount -o rw,remount /dev/md/dsk/d0 / mount: warning: cannot lock temp file </etc/.mnt.lock>
This command edits the /etc/system and /etc/vfstab les to specify that the root (/) le system is now on volume d0. 4. Verify that the /etc/vfstab le contains the correct volume entries. The root (/) entry in the /etc/vfstab le should appear as follows so that the entry for the le system correctly references the RAID 1 volume:
286 Solaris Volume Manager Administration Guide April 2004
fsck pass 1 2 -
In the following example, the boot device contains two of the six state database replicas and the root (/), swap, and /usr submirrors fails. Initially, when the boot device fails, youll see a message similar to the following. This message might differ among various architectures.
Rebooting with command: Boot device: /iommu/sbus/dma@f,81000/esp@f,80000/sd@3,0 The selected SCSI device is not responding Cant open boot device ...
When you see this message, note the device. Then, follow these steps: 1. Boot from another root (/) submirror. Since only two of the six state database replicas in this example are in error, you can still boot. If this were not the case, you would need to delete the inaccessible state database replicas in single-user mode. This procedure is described in How to Recover From Insufficient State Database Replicas on page 291. When you created the mirror for the root (/) le system, you should have recorded the alternate boot device as part of that procedure. In this example, disk2 is that alternate boot device.
ok boot disk2 SunOS Release 5.9 Version s81_51 64-bit Copyright 1983-2001 Sun Microsystems, Inc.
Hostname: demo ... demo console login: root Password: <root-password> Dec 16 12:22:09 lexicon login: ROOT LOGIN /dev/console Last login: Wed Dec 12 10:55:16 on console Sun Microsystems Inc. SunOS 5.9 s81_51 May 2002 ...
2. Determine that two state database replicas have failed by using the metadb command.
# metadb flags M p M p a m p luo a p luo a p luo a p luo first blk unknown unknown 16 1050 16 1050 block count unknown unknown 1034 1034 1034 1034
The system can no longer detect state database replicas on slice /dev/dsk/c0t3d0s3, which is part of the failed disk. 3. Determine that half of the root (/), swap, and /usr mirrors have failed by using the metastat command.
# metastat d0: Mirror Submirror 0: d10 State: Needs maintenance Submirror 1: d20 State: Okay ... d10: Submirror of d0 State: Needs maintenance Invoke: "metareplace d0 /dev/dsk/c0t3d0s0 <new device>" Size: 47628 blocks Stripe 0: Device Start Block Dbase State Hot Spare /dev/dsk/c0t3d0s0 0 No Maintenance d20: Submirror of d0 State: Okay Size: 47628 blocks Stripe 0: Device /dev/dsk/c0t2d0s0
Start Block 0
Hot Spare
d1: Mirror Submirror 0: d11 State: Needs maintenance Submirror 1: d21 State: Okay 288 Solaris Volume Manager Administration Guide April 2004
... d11: Submirror of d1 State: Needs maintenance Invoke: "metareplace d1 /dev/dsk/c0t3d0s1 <new device>" Size: 69660 blocks Stripe 0: Device Start Block Dbase State Hot Spare /dev/dsk/c0t3d0s1 0 No Maintenance d21: Submirror of d1 State: Okay Size: 69660 blocks Stripe 0: Device /dev/dsk/c0t2d0s1
Start Block 0
Hot Spare
d2: Mirror Submirror 0: d12 State: Needs maintenance Submirror 1: d22 State: Okay ... d2: Mirror Submirror 0: d12 State: Needs maintenance Submirror 1: d22 State: Okay ... d12: Submirror of d2 State: Needs maintenance Invoke: "metareplace d2 /dev/dsk/c0t3d0s6 <new device>" Size: 286740 blocks Stripe 0: Device Start Block Dbase State Hot Spare /dev/dsk/c0t3d0s6 0 No Maintenance
d22: Submirror of d2 State: Okay Size: 286740 blocks Stripe 0: Device Start Block /dev/dsk/c0t2d0s6 0
Hot Spare
In this example, the metastat command shows that following submirrors need maintenance:
I I I
Submirror d10, device c0t3d0s0 Submirror d11, device c0t3d0s1 Submirror d12, device c0t3d0s6
289
4. Halt the system, replace the disk, and use the format command or the fmthard command, to partition the disk as it was before the failure.
Tip If the new disk is identical to the existing disk (the intact side of the mirror in this example), use prtvtoc /dev/rdsk/c0t2d0s2 | fmthard -s /dev/rdsk/c0t3d0s2 to quickly format the new disk (c0t3d0 in this example)
# halt ... Halted ... ok boot ... # format /dev/rdsk/c0t3d0s0
5. Reboot. Note that you must reboot from the other half of the root (/) mirror. You should have recorded the alternate boot device when you created the mirror.
# halt ... ok boot disk2
6. To delete the failed state database replicas and then add them back, use the metadb command.
# metadb flags first blk M p unknown M p unknown a m p luo 16 a p luo 1050 a p luo 16 a p luo 1050 # metadb -d c0t3d0s3 # metadb -c 2 -a c0t3d0s3 # metadb flags first blk a m p luo 16 a p luo 1050 a p luo 16 a p luo 1050 a u 16 a u 1050 block count unknown unknown 1034 1034 1034 1034
After some time, the resynchronization will complete. You can now return to booting from the original device.
3. If one or more disks are known to be unavailable, delete the state database replicas on those disks. Otherwise, delete enough errored state database replicas (W, M, D, F, or R status ags reported by metadb) to ensure that a majority of the existing state database replicas are not errored. Delete the state database replica on the bad disk using the metadb -d command.
Tip State database replicas with a capitalized status ag are in error, while those with lowercase status ags are functioning normally.
4. Verify that the replicas have been deleted by using the metadb command. 5. Reboot.
Chapter 25 Troubleshooting Solaris Volume Manager (Tasks) 291
6. If necessary, you can replace the disk, format it appropriately, then add any state database replicas needed to the disk. Following the instructions in Creating State Database Replicas on page 62. Once you have a replacement disk, halt the system, replace the failed disk, and once again, reboot the system. Use the format command or the fmthard command to partition the disk as it was congured before the failure.
ta
%icc,%g0 + 125
Sun Ultra 5/10 UPA/PCI (UltraSPARC-IIi 270MHz), No Keyboard OpenBoot 3.11, 128 MB memory installed, Serial #9841776. Ethernet address 8:0:20:96:2c:70, Host ID: 80962c70.
Rebooting with command: boot -s Boot device: /pci@1f,0/pci@1,1/ide@3/disk@0,0:a SunOS Release 5.9 Version s81_39 64-bit Copyright 1983-2001 Sun Microsystems, Inc. configuring IPv4 interfaces: hme0. Hostname: dodo 292
metainit: dodo: stale databases Insufficient metadevice database replicas located. Use metadb to delete databases which are broken. Ignore any "Read-only file system" error messages. Reboot the system when finished to reload the metadevice database. After reboot, repair any broken database replicas which were deleted. Type control-d to proceed with normal startup, (or give root password for system maintenance): root password single-user privilege assigned to /dev/console. Entering System Maintenance Mode Jun 7 08:57:25 su: su Sun Microsystems Inc. # metadb -i flags a m p lu a p l a p l M p M p M p M p M p M p M p # metadb -d c1t1d0s0 # metadb flags a m p lu a p l a p l # root succeeded for root on /dev/console SunOS 5.9 s81_39 May 2002 first blk 16 8208 16400 16 8208 16400 24592 32784 40976 49168 block count 8192 8192 8192 unknown unknown unknown unknown unknown unknown unknown
/dev/dsk/c0t0d0s7 /dev/dsk/c0t0d0s7 /dev/dsk/c0t0d0s7 /dev/dsk/c1t1d0s0 /dev/dsk/c1t1d0s0 /dev/dsk/c1t1d0s0 /dev/dsk/c1t1d0s0 /dev/dsk/c1t1d0s0 /dev/dsk/c1t1d0s0 /dev/dsk/c1t1d0s0
The system paniced because it could no longer detect state database replicas on slice /dev/dsk/c1t1d0s0, which is part of the failed disk or attached to a failed controller. The rst metadb -i command identies the replicas on this slice as having a problem with the master blocks. When you delete the stale state database replicas, the root (/) le system is read-only. You can ignore the mddb.cf error messages displayed. At this point, the system is again functional, although it probably has fewer state database replicas than it should, and any volumes that used part of the failed storage are also either failed, errored, or hot-spared; those issues should be addressed promptly.
293
Panics
If a le system detects any internal inconsistencies while it is in use, it will panic the system. If the le system is congured for logging, it noties the transactional volume that it needs to be checked at reboot. The transactional volume transitions itself to the Hard Error state. All other transactional volumes that share the same log device also go into the Hard Error state. At reboot, fsck checks and repairs the le system and transitions the le system back to the Okay state. fsck completes this process for all transactional volumes listed in the /etc/vfstab le for the affected log device.
294
Note If your conguration included other Solaris Volume Manager volumes that were built on top of soft partitions, you should recover the soft partitions before attempting to recover the other volumes.
Conguration information about your soft partitions is stored on your devices and in your state database. Since either of these sources could be corrupt, you must tell the metarecover command which source is reliable. First, use the metarecover command to determine whether the two sources agree. If they do agree, the metarecover command cannot be used to make any changes. If the metarecover command reports an inconsistency, however, you must examine its output carefully to determine whether the disk or the state database is corrupt, then you should use the metarecover command to rebuild the conguration based on the appropriate source. 1. Read the Conguration Guidelines for Soft Partitions on page 128. 2. Review the soft partition recovery information by using the metarecover command.
# metarecover component-p -d
In this case, component is the c*t*d*s*name of the raw component. The -d option indicates to scan the physical slice for extent headers of soft partitions. For more information, see the metarecover(1M) man page.
295
296
This example recovers three soft partitions from disk, after the state database replicas were accidentally deleted.
Note This process only works to recover volumes from the local disk set.
3. Determine the major/minor number for a slice containing a state database replica on the newly added disks. Use ls -lL, and note the two numbers between the group name and the date. Those are the major/minor numbers for this slice.
Chapter 25 Troubleshooting Solaris Volume Manager (Tasks) 297
32, 71 Dec
5 10:05 /dev/dsk/c1t9d0s7
4. If necessary, determine the major name corresponding with the major number by looking up the major number in /etc/name_to_major.
# grep " 32" /etc/name_to_major sd 32
5. Update the /kernel/drv/md.conf le with two commands: one command to tell Solaris Volume Manager where to nd a valid state database replica on the new disks, and one command to tell it to trust the new replica and ignore any conicting device ID information on the system. In the line in this example that begins with mddb_bootlist1, replace the sd in the example with the major name you found in the previous step. Replace 71 in the example with the minor number you identied in Step 3.
#pragma ident "@(#)md.conf 2.1 00/07/07 SMI" # # Copyright (c) 1992-1999 by Sun Microsystems, Inc. # All rights reserved. # name="md" parent="pseudo" nmd=128 md_nsets=4; # #pragma ident "@(#)md.conf 2.1 00/07/07 SMI" # # Copyright (c) 1992-1999 by Sun Microsystems, Inc. # All rights reserved. # name="md" parent="pseudo" nmd=128 md_nsets=4; # Begin MDD database info (do not edit) mddb_bootlist1="sd:71:16:id0"; md_devid_destroy=1;# End MDD database info (do not edit)
6. Reboot to force Solaris Volume Manager to reload your conguration. You will see messages similar to the following displayed to the console.
volume management starting. Dec 5 10:11:53 lexicon metadevadm: Disk movement detected Dec 5 10:11:53 lexicon metadevadm: Updating device names in Solaris Volume Manager The system is ready.
Interlace: 32 blocks Size: 125685 blocks Original device: Size: 128576 blocks Device c1t11d0s3 c1t12d0s3 c1t13d0s3 d20: Soft Partition Device: d10 State: Okay Size: 8192 blocks Extent 0 d21: Soft Partition Device: d10 State: Okay Size: 8192 blocks Extent 0 d22: Soft Partition Device: d10 State: Okay Size: 8192 blocks Extent 0
Dbase No No No
Hot Spare
d10: Mirror Submirror 0: d0 State: Okay Submirror 1: d1 State: Okay Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 82593 blocks d0: Submirror of d10 State: Okay Size: 118503 blocks Stripe 0: (interlace: 32 blocks) Device Start Block c1t9d0s0 0 c1t10d0s0 3591
Hot Spare
d1: Submirror of d10 State: Okay Size: 82593 blocks Stripe 0: (interlace: 32 blocks) Device Start Block c1t9d0s1 0
Reloc Yes
Hot Spare
299
c1t10d0s1
No
Okay
Yes
Device Relocation Information: Device Reloc Device ID c1t9d0 Yes id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3487980000U00907AZ c1t10d0 Yes id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3397070000W0090A8Q c1t11d0 Yes id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3449660000U00904NZ c1t12d0 Yes id1,sd@SSEAGATE_ST39103LCSUN9.0GLS32655400007010H04J c1t13d0 Yes id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3461190000701001T0 # # metadb flags first blk block count a m p luo 16 8192 /dev/dsk/c1t9d0s7 a luo 16 8192 /dev/dsk/c1t10d0s7 a luo 16 8192 /dev/dsk/c1t11d0s7 a luo 16 8192 /dev/dsk/c1t12d0s7 a luo 16 8192 /dev/dsk/c1t13d0s7 # metastat d12: RAID State: Okay Interlace: 32 blocks Size: 125685 blocks Original device: Size: 128576 blocks Device Start Block Dbase State Reloc Hot Spare c1t11d0s3 330 No Okay Yes c1t12d0s3 330 No Okay Yes c1t13d0s3 330 No Okay Yes d20: Soft Partition Device: d10 State: Okay Size: 8192 blocks Extent 0 d21: Soft Partition Device: d10 State: Okay Size: 8192 blocks Extent 0 d22: Soft Partition Device: d10 State: Okay Size: 8192 blocks Extent 0 d10: Mirror Submirror 0: d0 State: Okay Submirror 1: d1 300 Solaris Volume Manager Administration Guide April 2004
State: Okay Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 82593 blocks d0: Submirror of d10 State: Okay Size: 118503 blocks Stripe 0: (interlace: 32 blocks) Device Start Block c1t9d0s0 0 c1t10d0s0 3591
Hot Spare
d1: Submirror of d10 State: Okay Size: 82593 blocks Stripe 0: (interlace: 32 blocks) Device Start Block c1t9d0s1 0 c1t10d0s1 0
Hot Spare
Device Relocation Information: Device Reloc Device ID c1t9d0 Yes id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3487980000U00907AZ1 c1t10d0 Yes id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3397070000W0090A8Q c1t11d0 Yes id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3449660000U00904NZ c1t12d0 Yes id1,sd@SSEAGATE_ST39103LCSUN9.0GLS32655400007010H04J c1t13d0 Yes id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3461190000701001T0 # metastat -p d12 -r c1t11d0s3 c1t12d0s3 c1t13d0s3 -k -i 32b d20 -p d10 -o 3592 -b 8192 d21 -p d10 -o 11785 -b 8192 d22 -p d10 -o 19978 -b 8192 d10 -m d0 d1 1 d0 1 2 c1t9d0s0 c1t10d0s0 -i 32b d1 1 2 c1t9d0s1 c1t10d0s1 -i 32b #
301
302
APPENDIX
System Files and Startup Files on page 303 Manually Congured Files on page 305
/etc/lvm/mddb.cf
Caution Do not edit this le. If you change this le, you could corrupt your Solaris Volume Manager conguration.
The /etc/lvm/mddb.cf le records the locations of state database replicas. When state database replica locations change, Solaris Volume Manager makes an entry in the mddb.cf le that records the locations of all state databases. See mddb.cf(4) for more information.
I
/etc/lvm/md.cf The /etc/lvm/md.cf le contains automatically generated conguration information for the default (unspecied or local) disk set. When you change the Solaris Volume Manager conguration, Solaris Volume Manager automatically updates the md.cf le (except for information about hot spares in use). See md.cf(4) for more information.
303
Caution Do not edit this le. If you change this le, you could corrupt your Solaris Volume Manager conguration or be unable to recover your Solaris Volume Manager conguration.
If your system loses the information maintained in the state database, and as long as no volumes were changed or created in the meantime, you can use the md.cf le to recover your conguration. See How to Initialize Solaris Volume Manager From a Conguration File on page 235.
I
/kernel/drv/md.conf The md.conf conguration le is read by Solaris Volume Manager at startup. You can edit two elds in this le: nmd, which sets the number of volumes (metadevices) that the conguration can support, and md_nsets, which is the number of disk sets. The default value for nmd is 128, which can be increased to 8192. The default value for md_nsets is 4, which can be increased to 32. The total number of named disk sets is always one less than the md_nsets value, because the default (unnamed or local) disk set is included in md_nsets.
Note Keep the values of nmd and md_nsets as low as possible. Memory structures exist for all possible devices as determined by nmd and md_nsets, even if you have not created those devices. For optimal performance, keep nmd and md_nsets only slightly higher than the number of volumes you will use.
/etc/rcS.d/S35svm.init This le congures and starts Solaris Volume Manager at boot and allows administrators to start and stop the daemons.
/etc/rc2.d/S95svm.sync This le checks the Solaris Volume Manager conguration at boot, starts resynchronization of mirrors if necessary, and starts the active monitoring daemon. (For more information, see mdmonitord(1M).)
304
Once you have created and updated the le, the metainit, metahs, and metadb commands then activate the volumes, hot spare pools, and state database replicas dened in the le. In the /etc/lvm/md.tab le, one complete conguration entry for a single volume appears on each line using the syntax of the metainit, metadb, and metahs commands.
Note If you use metainit -an to simulate initializing all of the volumes in md.tab,
you may see error messages for volumes that have dependencies on other volumes dened in md.tab. This occurs because Solaris Volume Manager does not maintain state of the volumes that would have been created when running metainit -an, so each line is evaluated based on the existing conguration, if a conguration exists. Therefore, even if it appears that metainit -an would fail, it might succeed without the -n option.
You then run the metainit command with either the -a option, to activate all volumes in the /etc/lvm/md.tab le, or with the volume name that corresponds to a specic entry in the le.
305
Note Solaris Volume Manager does not write to or store conguration information in the /etc/lvm/md.tab le. You must edit the le by hand and run the metainit, metahs, or metadb commands to create Solaris Volume Manager components.
306
APPENDIX
Command-Line Reference
Listed here are all the commands that you use to administer Solaris Volume Manager. For more detailed information, see the man pages.
TABLE B1
growfs
Expands a UFS le system in a nondestructive fashion. Deletes active volumes and hot spare pools. Creates and deletes state database replicas. Detaches a volume from a RAID 1 (mirror) volume, or a logging device from a transactional volume. Checks device ID conguration. Manages hot spares and hot spare pools. Congures volumes. Places submirrors offline.
growfs(1M)
307
TABLE B1
(Continued)
Man page
Places submirrors online. Modies volume parameters. Recovers conguration information for soft partitions. Renames and exchanges volume names. Replaces components in submirrors and RAID 5 volumes. Sets up system les for mirroring root (/). Administers disk sets. Displays the status of volumes or hot spare pools. Resynchronizes volumes during reboot. Attaches a component to a RAID 0 or RAID 1 volume, or a log device to a transactional volume.
metarename metareplace
metarename(1M) metareplace(1M)
metasync metattach
metasync(1M) metattach(1M)
308
APPENDIX
Attributes of and the operations against SVM devices Relationships among the various SVM devices Relationships among the SVM devices and other aspects of the operating system, such as le systems
This model is made available through the Solaris Web Based Enterprise Management (WBEM) SDK. The WBEM SDK is a set of Java technology-based APIs that allow access to system management capabilities that are represented by CIM. For more information about the CIM/WBEM SDK, see the Solaris WBEM Developers Guide.
309
310
Index
A
adding hot spares, 162 alternate boot device, x86, 107 alternate boot path, 103 conguration planning (Continued) trade-offs, 31 cron command, 276
D B
boot device, recovering from failure, 287 boot problems, 284 booting into single-user mode, 96 disk set, 203 adding another host to, 217, 218 adding disks to, 204 adding drives to, 216, 217 administering, 209, 210 checking status, 219, 223, 224 checking status in Enhanced Storage tool within the Solaris Management Console, 220 creating, 215 denition, 39, 44 displaying owner, 220 example, 224 example with two shared disk sets, 207 inability to use with /etc/vfstab le, 204 increasing the default number, 238 intended usage, 204 placement of replicas, 204 relationship to volumes and hot spare pools, 204 releasing, 210, 220, 222 reservation behavior, 210 reservation types, 210 reserving, 210, 222 usage, 203 DiskSuite Tool, See graphical interface
311
C
concatenated stripe denition, 72 example with three stripes, 72 removing, 85 concatenated volume, See concatenation concatenation creating, 80 denition, 70 example with three slices, 71 expanding, 83 expanding UFS le system, 70 information for creating, 74 information for recreating, 74 removing, 85 usage, 70 conguration planning guidelines, 29 overview, 29
E
enabling a hot spare, 169 enabling a slice in a RAID 5 volume, 149 enabling a slice in a submirror, 111 Enhanced Storage, See graphical interface errors, checking for using a script, 269 /etc/lvm/md.cf le, 303 /etc/lvm/mddb.cf le, 303 /etc/rc2.d/S95lvm.sync le, 304 /etc/rcS.d/S35lvm.init le, 304 /etc/vfstab le, 121, 185, 199 recovering from improper entries, 285
hot spare pool (Continued) basic operation, 44 changing association, 165 conceptual overview, 153, 155 creating, 161 denition, 39, 44 example with mirror, 155 states, 166
I
I/O, 32 interfaces, See Solaris Volume Manager interfaces interlace, specifying, 79
F
failover conguration, 44, 203 le system expanding by creating a concatenation, 82 expansion overview, 41, 42 growing, 239 guidelines, 45 panics, 294 unmirroring, 122 fmthard command, 290, 292 format command, 290, 292 fsck command, 200, 201
K
/kernel/drv/md.conf le, 238, 304
L
local disk set, 204 lockfs command, 124, 200 log device denition, 173 problems when sharing, 200 recovering from errors, 201 shared, 173, 176 sharing, 199 space required, 176 logging device, hard error state, 294
G
general performance guidelines, 31 graphical interface, overview, 36 growfs command, 41, 240, 241, 307 growfs functionality, 42 GUI, sample, 37
M H
hot spare, 154 adding to a hot spare pool, 162 conceptual overview, 154 enabling, 169 replacing in a hot spare pool, 167 hot spare pool, 44 administering, 156 associating, 164
312 Solaris Volume Manager Administration Guide April 2004
majority consensus algorithm, 54 master device, denition, 173 md.cf le, 304 recovering a Solaris Volume Manager conguration, 236 md.tab le, 236 overview, 305 metaclear command, 85, 119, 120, 307 metadb command, 65, 307
metadetach command, 110, 119, 120, 307 metadevice, See volume metahs command, 169, 307 metainit command, 184, 236, 307 metaoffline command, 110, 307 metaonline command, 308 metaparam command, 115, 163, 308 metarename command, 234, 308 metareplace command, 111, 149, 290, 308 metaroot command, 308 metaset command, 215, 220, 308 metassist, See top down volume creation metastat command, 114, 145, 177, 308 metasync command, 308 metattach, task, 103 metattach command, 83, 109, 116, 308 attach RAID 5 component, 148 attach submirror, 237 mirror, 87 and disk geometries, 95 and online backup, 123 attaching a submirror, 109 changing options, 116 denition, 40 detach vs. offline, 95 example with two submirrors, 88 expanding, 116 explanation of error states, 243 guidelines, 90 information for creating, 95 information for replacing and enabling components, 244 maintenance vs. last erred, 243 options, 91 overview of replacing and enabling components, 142 overview of replacing and enabling slices, 241 resynchronization, 93, 94 sample status output, 114 three-way mirror, 95 two-way mirror, 100, 254, 255, 256, 257, 258, 259, 260, 261 mirroring le system that can be unmounted, 103 read and write performance, 30 root (/), /usr, and swap, 105 unused slices, 99
N
newfs command, 201
O
online backup, 123
P
pass number and read-only mirror, 94 dened, 94
R
RAID, levels supported in Solaris Volume Manager, 28 RAID 0 volume denition, 67, 68 usage, 68 RAID 5 parity calculations, 141 RAID 5 volume and interlace, 140 creating, 144 denition, 28, 40 enabling a failed slice, 149 example with an expanded device, 139 example with four slices, 138 expanding, 148 explanation of error states, 243 information for replacing and enabling components, 244 initializing slices, 137 maintenance vs. last erred, 243 overview of replacing and enabling slices, 241 parity information, 137, 140 replacing a failed slice, 151 resynchronizing slices, 137 random I/O, 32 raw volume, 79, 80, 100, 145 read policies overview, 92 releasing a disk set, 220, 222, 224 renaming volumes, 232 replica, 43
Index 313
reserving a disk set, 222 resynchronization full, 93 optimized, 93 partial, 94 root (/) mirroring, 104 unmirroring, 121
S
SCSI disk replacing, 279, 282, 283 sequential I/O, 33 shared disk set, 44 simple volume See RAID 0 volume denition, 40 slices adding to a RAID 5 volume, 148 expanding, 82 soft partition checking status, 133 creating, 132 deleting, 135 expanding, 134 growing, 134 recovering conguration for, 295 removing, 135 soft partitioning denition, 128 guidelines, 128 locations, 128 Solaris Volume Manager See Solaris Volume Manager conguration guidelines, 45 recovering the conguration, 236 Solaris Volume Manager elements, overview, 38 Solaris Volume Manager interfaces command line, 36 sample GUI, 37 Solaris Management Console, 36 state database conceptual overview, 43, 54 corrupt, 54 denition, 39, 43
314 Solaris Volume Manager Administration Guide April 2004
state database (Continued) recovering from stale replicas, 291 state database replicas, 43 adding larger replicas, 65 basic operation, 54 creating additional, 62 creating multiple on a single slice, 56 denition, 43 errors, 58 location, 44, 56, 57 minimum number, 56 recovering from stale replicas, 291 two-disk conguration, 58 usage, 53 status, 220 stripe creating, 79 denition, 68 example with three slices, 69 expanding, 83 information for creating, 74 information for recreating, 74 removing, 85 striped volume, See stripe striping, denition, 68 submirror, 88 attaching, 88 detaching, 88 enabling a failed slice, 111 operation while offline, 88 placing offline and online, 110 replacing a failed slice, 118 replacing entire, 119 swap mirroring, 105 unmirroring, 122 system les, 303
T
three-way mirror, 95 top down volume creation, overview, 250 transactional volume and /etc/vfstab le, 184 creating for a le system that cannot be unmounted, 185 creating using metarename, 191, 192, 195
transactional volume (Continued) creating using mirrors, 186 denition, 40 determining le systems to log, 176 example with mirrors, 173 example with shared log device, 174 expanding, 194 guidelines, 175 recovering from errors, 202, 294 removing using metarename, 196 states, 177 usage, 173 troubleshooting, general guidelines, 278
U
UFS logging, denition, 171 /usr logging, 185 mirroring, 104 unmirroring, 121
V
/var/adm/messages le, 242, 280 volume conceptual overview, 39 default number, 237 denition, 39 expanding disk space, 41 increasing the default number, 237 name switching, 233, 234 naming conventions, 42, 207 renaming, 235 types, 40 uses, 40 using le system commands on, 40 virtual disk, 35 volume name switching, 43, 234
W
write policies overview, 92
Index
315
316