Scale Adm
Scale Adm
Version 5.0.2
Administration Guide
IBM
SC27-9288-01
IBM Spectrum Scale
Version 5.0.2
Administration Guide
IBM
SC27-9288-01
Note
Before using this information and the product it supports, read the information in “Notices” on page 739.
This edition applies to version 5 release 0 modification 2 of the following products, and to all subsequent releases
and modifications until otherwise indicated in new editions:
v IBM Spectrum Scale ordered through Passport Advantage® (product number 5725-Q01)
v IBM Spectrum Scale ordered through AAS/eConfig (product number 5641-GPF)
v IBM Spectrum Scale for Linux on Z (product number 5725-S28)
v IBM Spectrum Scale for IBM ESS (product number 5765-ESS)
Significant changes or additions to the text and illustrations are indicated by a vertical line (|) to the left of the
change.
IBM welcomes your comments; see the topic “How to send your comments” on page xxv. When you send
information to IBM, you grant IBM a nonexclusive right to use or distribute the information in any way it believes
appropriate without incurring any obligation to you.
© Copyright IBM Corporation 2014, 2018.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract
with IBM Corp.
Contents
Tables . . . . . . . . . . . . . . . xi Deleting a Cluster Export Services node from an
IBM Spectrum Scale cluster . . . . . . . . . 34
About this information. . . . . . . . xiii Setting up Cluster Export Services groups in an IBM
Spectrum Scale cluster . . . . . . . . . . . 34
Prerequisite and related information . . . . . xxiv
Conventions used in this information . . . . . xxiv
How to send your comments . . . . . . . . xxv Chapter 3. Configuring and tuning your
system for GPFS . . . . . . . . . . 37
Summary of changes . . . . . . . xxvii General system configuration and tuning
considerations . . . . . . . . . . . . . 37
Clock synchronization . . . . . . . . . . 37
Chapter 1. Configuring the GPFS cluster 1
GPFS administration security . . . . . . . 37
Creating your GPFS cluster . . . . . . . . . 1
Cache usage . . . . . . . . . . . . . 38
Displaying cluster configuration information . . . 1
Access patterns . . . . . . . . . . . . 40
Basic configuration information . . . . . . . 1
Aggregate network interfaces . . . . . . . 40
Information about protocol nodes . . . . . . 2
Swap space . . . . . . . . . . . . . 40
Adding nodes to a GPFS cluster . . . . . . . . 2
Linux configuration and tuning considerations . . 41
Deleting nodes from a GPFS cluster . . . . . . 3
updatedb considerations . . . . . . . . . 41
Changing the GPFS cluster configuration data . . . 4
Memory considerations . . . . . . . . . 41
Security mode . . . . . . . . . . . . . 16
GPFS helper threads . . . . . . . . . . 41
Running IBM Spectrum Scale commands without
Communications I/O . . . . . . . . . . 42
remote root login . . . . . . . . . . . . 17
Disk I/O . . . . . . . . . . . . . . 42
Configuring sudo . . . . . . . . . . . 17
AIX configuration and tuning considerations . . . 43
Configuring the cluster to use sudo wrapper
GPFS use with Oracle . . . . . . . . . . 43
scripts . . . . . . . . . . . . . . . 18
Configuring IBM Spectrum Scale GUI to use
sudo wrapper . . . . . . . . . . . . 19 Chapter 4. Parameters for performance
Configuring a cluster to stop using sudo wrapper tuning and optimization . . . . . . . 45
scripts . . . . . . . . . . . . . . . 19 Tuning parameters change history . . . . . . . 47
Root-level processes that call administration
commands directly . . . . . . . . . . . 20 Chapter 5. Ensuring high availability of
Node quorum considerations . . . . . . . . 20 the GUI service . . . . . . . . . . . 53
Node quorum with tiebreaker considerations . . . 20
Displaying and changing the file system manager
node. . . . . . . . . . . . . . . . . 21
Chapter 6. Configuring and tuning your
Determining how long mmrestripefs takes to system for Cloud services . . . . . . 55
complete . . . . . . . . . . . . . . . 22 Designating the Cloud services nodes . . . . . 55
Starting and stopping GPFS . . . . . . . . . 22 Starting up the Cloud services software . . . . . 56
Shutting down an IBM Spectrum Scale cluster . . . 23 Managing a cloud storage account. . . . . . . 57
| Amazon S3 . . . . . . . . . . . . . 57
Chapter 2. Configuring the CES and | Swift3 account . . . . . . . . . . . . 58
| IBM Cloud Object Storage . . . . . . . . 58
protocol configuration . . . . . . . . 25 | Openstack Swift . . . . . . . . . . . . 59
Configuring Cluster Export Services . . . . . . 25 Defining cloud storage access points (CSAP) . . . 60
Setting up Cluster Export Services shared root Creating Cloud services . . . . . . . . . . 61
file system. . . . . . . . . . . . . . 25 Configuring Cloud services with SKLM (optional) 62
Configuring Cluster Export Services nodes . . . 26 Binding your file system or fileset to the Cloud
Configuring CES protocol service IP addresses . 26 service by creating a container pair set . . . . . 63
CES IP aliasing to network adapters on protocol Backing up the Cloud services database to the cloud 66
nodes . . . . . . . . . . . . . . . 28 Backing up the Cloud services configuration . . . 66
Deploying Cluster Export Services packages on Configuring the maintenance windows . . . . . 67
existing IBM Spectrum Scale 4.1.1 and later nodes 32 Enabling a policy for Cloud data sharing export
Verifying the final CES configurations . . . . 33 service . . . . . . . . . . . . . . . . 69
Creating and configuring file systems and filesets Tuning Cloud services parameters . . . . . . . 70
for exports. . . . . . . . . . . . . . . 33 Integrating Cloud services metrics with the
Configuring with the installation toolkit . . . . . 33 performance monitoring tool . . . . . . . . 72
GPFS-based configuration . . . . . . . . 73
Contents v
curl commands for unified file and object access Chapter 24. Considerations for GPFS
related user tasks . . . . . . . . . . . 280 applications . . . . . . . . . . . . 343
Configuration files for IBM Spectrum Scale for Exceptions to Open Group technical standards . . 343
object storage . . . . . . . . . . . . . 280 Determining if a file system is controlled by GPFS 343
Backing up and restoring object storage . . . . 284 Exceptions and limitations to NFS V4 ACLs
Backing up the object storage . . . . . . . 284 support . . . . . . . . . . . . . . . 344
Restoring the object storage . . . . . . . 286 Linux ACLs and extended attributes. . . . . 344
Configuration of object for isolated node and General CES NFS Linux limitations . . . . . . 345
network groups . . . . . . . . . . . . 289 Considerations for the use of direct I/O
Enabling the object heatmap policy . . . . . . 290 (O_DIRECT). . . . . . . . . . . . . . 345
Chapter 20. Managing GPFS quotas 293 Chapter 25. Accessing a remote GPFS
Enabling and disabling GPFS quota management 293
file system . . . . . . . . . . . . 347
Default quotas . . . . . . . . . . . . . 294
Remote user access to a GPFS file system . . . . 349
Implications of quotas for different protocols . . . 296
Using NFS/SMB protocol over remote cluster
Explicitly establishing and changing quotas . . . 297
mounts . . . . . . . . . . . . . . . 350
Setting quotas for users on a per-project basis . . 298
Configuring protocols on a separate cluster . . 351
Checking quotas . . . . . . . . . . . . 300
Managing multi-cluster protocol environments 352
Listing quotas . . . . . . . . . . . . . 301
Upgrading multi-cluster environments . . . . 353
Activating quota limit checking . . . . . . . 302
Limitations of protocols on remotely mounted
Deactivating quota limit checking . . . . . . 303
file systems . . . . . . . . . . . . . 353
Changing the scope of quota limit checking . . . 303
Mounting a remote GPFS file system . . . . . 354
Creating file system quota reports . . . . . . 303
Managing remote access to a GPFS file system . . 356
Restoring quota files . . . . . . . . . . . 304
Using remote access with multiple network
definitions . . . . . . . . . . . . . . 356
Chapter 21. Managing GUI users . . . 307 Using multiple security levels for remote access 358
Changing security keys with remote access . . . 359
Chapter 22. Managing GPFS access NIST compliance . . . . . . . . . . . . 360
control lists . . . . . . . . . . . . 311 Important information about remote access . . . 361
Traditional GPFS ACL administration . . . . . 311
Setting traditional GPFS access control lists . . 312 Chapter 26. Information lifecycle
Displaying traditional GPFS access control lists 313 management for IBM Spectrum Scale . 363
Applying an existing traditional GPFS access Storage pools . . . . . . . . . . . . . 363
control list . . . . . . . . . . . . . 313 Internal storage pools . . . . . . . . . 364
Changing traditional GPFS access control lists 314 External storage pools . . . . . . . . . 369
Deleting traditional GPFS access control lists 314 Policies for automating file management . . . . 370
NFS V4 ACL administration . . . . . . . . 315 Overview of policies . . . . . . . . . . 370
NFS V4 ACL Syntax . . . . . . . . . . 315 Policy rules . . . . . . . . . . . . . 371
NFS V4 ACL translation . . . . . . . . . 317 The mmapplypolicy command and policy rules 391
Setting NFS V4 access control lists . . . . . 318 Policy rules: Examples and tips . . . . . . 394
Displaying NFS V4 access control lists . . . . 318 Managing policies . . . . . . . . . . . 399
Applying an existing NFS V4 access control list 318 Working with external storage pools. . . . . 403
Changing NFS V4 access control lists . . . . 318 Backup and restore with storage pools . . . . 408
Deleting NFS V4 access control lists . . . . . 319 ILM for snapshots . . . . . . . . . . . 409
Considerations when using GPFS with NFS V4 Filesets . . . . . . . . . . . . . . . 410
ACLs . . . . . . . . . . . . . . . 319 Fileset namespace . . . . . . . . . . . 411
Authorizing protocol users . . . . . . . . . 319 Filesets and quotas . . . . . . . . . . 412
Authorizing file protocol users . . . . . . 319 Filesets and storage pools . . . . . . . . 412
Authorizing object users. . . . . . . . . 329 Filesets and global snapshots . . . . . . . 412
Authorization limitations . . . . . . . . 335 Fileset-level snapshots . . . . . . . . . 413
Filesets and backup . . . . . . . . . . 413
Chapter 23. Native NFS and GPFS . . 337 Managing filesets . . . . . . . . . . . 414
Exporting a GPFS file system using NFS . . . . 337 Immutability and appendOnly features . . . . . 417
Export considerations . . . . . . . . . 338
NFS usage of GPFS cache . . . . . . . . . 340 Chapter 27. Creating and maintaining
Synchronous writing using NFS . . . . . . . 340 snapshots of file systems . . . . . . 421
Unmounting a file system after NFS export . . . 340
Creating a snapshot . . . . . . . . . . . 421
NFS automount considerations . . . . . . . 341
Listing snapshots . . . . . . . . . . . . 422
Clustered NFS and GPFS on Linux . . . . . . 341
Restoring a file system from a snapshot . . . . 423
Chapter 28. Creating and managing Chapter 34. Protocols cluster disaster
file clones . . . . . . . . . . . . . 429 recovery . . . . . . . . . . . . . 485
Creating file clones . . . . . . . . . . . 429 Protocols cluster disaster recovery limitations and
Listing file clones . . . . . . . . . . . . 430 prerequisites. . . . . . . . . . . . . . 485
Deleting file clones . . . . . . . . . . . 431 Example setup for protocols disaster recovery . . 486
Splitting file clones from clone parents . . . . . 431 Setting up gateway nodes to ensure cluster
File clones and disk space management . . . . 431 communication during failover . . . . . . . 487
File clones and snapshots . . . . . . . . . 431 Creating the inband disaster recovery setup . . . 487
File clones and policy files . . . . . . . . . 432 Creating the outband disaster recovery setup . . . 489
Performing failover for protocols cluster when
Chapter 29. Scale Out Backup and primary cluster fails . . . . . . . . . . . 491
Restore (SOBAR). . . . . . . . . . 433 Re-create file export configuration . . . . . 491
Backup procedure with SOBAR . . . . . . . 433 Restore file export configuration . . . . . . 491
Restore procedure with SOBAR . . . . . . . 435 Performing failback to old primary for protocols
cluster . . . . . . . . . . . . . . . . 492
Chapter 30. Data Mirroring and Re-create file protocol configuration for old
primary . . . . . . . . . . . . . . 492
Replication . . . . . . . . . . . . 439
Restore file protocol configuration for old
General considerations for using storage replication
primary . . . . . . . . . . . . . . 493
with GPFS . . . . . . . . . . . . . . 440
Performing failback to new primary for protocols
Data integrity and the use of consistency groups 440
cluster . . . . . . . . . . . . . . . . 495
Handling multiple versions of IBM Spectrum Scale
Re-create file protocol configuration for new
data . . . . . . . . . . . . . . . . 440
primary . . . . . . . . . . . . . . 495
Continuous Replication of IBM Spectrum Scale
Restore file protocol configuration for new
data . . . . . . . . . . . . . . . . 441
primary . . . . . . . . . . . . . . 498
Synchronous mirroring with GPFS replication 441
Backing up and restoring protocols and CES
Synchronous mirroring utilizing storage based
configuration information . . . . . . . . . 501
replication . . . . . . . . . . . . . 451
Updating protocols and CES configuration
Point In Time Copy of IBM Spectrum Scale data 459
information . . . . . . . . . . . . . . 502
Protocols and cluster configuration data required
Chapter 31. Implementing a clustered for disaster recovery . . . . . . . . . . . 502
NFS environment on Linux . . . . . 463 Object data required for protocols cluster DR 502
NFS monitoring . . . . . . . . . . . . 463 SMB data required for protocols cluster DR . . 508
NFS failover . . . . . . . . . . . . . . 463 NFS data required for protocols cluster DR . . 510
NFS locking and load balancing . . . . . . . 463 Authentication related data required for
CNFS network setup . . . . . . . . . . . 464 protocols cluster DR . . . . . . . . . . 511
CNFS setup . . . . . . . . . . . . . . 464 CES data required for protocols cluster DR . . 512
CNFS administration . . . . . . . . . . . 465
Chapter 35. File Placement Optimizer 515
Chapter 32. Implementing Cluster Distributing data across a cluster . . . . . . . 519
Export Services . . . . . . . . . . 467 FPO pool file placement and AFM . . . . . . 520
CES features. . . . . . . . . . . . . . 467 Configuring FPO . . . . . . . . . . . . 520
CES cluster setup . . . . . . . . . . . 467 Configuring IBM Spectrum Scale Clusters . . . 520
CES network configuration . . . . . . . . 468 Basic Configuration Recommendations . . . . 525
CES address failover and distribution policies 469 Configuration and tuning of Hadoop workloads 536
CES protocol management . . . . . . . . 470 Configuration and tuning of database
CES management and administration . . . . 471 workloads . . . . . . . . . . . . . 537
CES NFS support . . . . . . . . . . . . 471 Configuring and tuning SparkWorkloads . . . 537
CES SMB support . . . . . . . . . . . . 473 Ingesting data into IBM Spectrum Scale clusters 538
CES OBJ support . . . . . . . . . . . . 474 Exporting data out of IBM Spectrum Scale clusters 538
Migration of CNFS clusters to CES clusters . . . 477 Upgrading FPO . . . . . . . . . . . . 538
Monitoring and administering IBM Spectrum Scale
FPO clusters. . . . . . . . . . . . . . 541
Rolling upgrades . . . . . . . . . . . 542
Contents vii
The IBM Spectrum Scale FPO cluster . . . . 544 Enabling secured connection between the IBM
Failure detection . . . . . . . . . . . 546 Spectrum Scale system and authentication
Disk Failures . . . . . . . . . . . . 546 server . . . . . . . . . . . . . . . 659
Node failure. . . . . . . . . . . . . 548 Securing data transfer . . . . . . . . . 662
Handling multiple nodes failure . . . . . . 550 Securing NFS data transfer . . . . . . . . 662
Network switch failure . . . . . . . . . 551 Securing SMB data transfer. . . . . . . . 665
Data locality. . . . . . . . . . . . . 551 Secured object data transfer . . . . . . . 665
Disk Replacement . . . . . . . . . . . 560 Data security limitations. . . . . . . . . . 665
Auto recovery . . . . . . . . . . . . . 562
Failure and recovery . . . . . . . . . . 562 Chapter 39. Cloud services:
QoS support for autorecovery . . . . . . . 564 Transparent cloud tiering and Cloud
Restrictions . . . . . . . . . . . . . . 564
data sharing . . . . . . . . . . . . 667
Administering files for Transparent cloud tiering 667
Chapter 36. Encryption . . . . . . . 565 Applying a policy on a Transparent cloud
Encryption keys . . . . . . . . . . . . 565 tiering node . . . . . . . . . . . . . 667
Encryption policies . . . . . . . . . . . 566 Migrating files to the cloud storage tier. . . . 670
Encryption policy rules . . . . . . . . . . 566 Pre-migrating files to the cloud storage tier . . 670
Preparation for encryption . . . . . . . . . 571 Recalling files from the cloud storage tier . . . 672
Establishing an encryption-enabled environment 576 Reconciling files between IBM Spectrum Scale
Simplified setup: Using SKLM with a file system and cloud storage tier. . . . . . 672
self-signed certificate . . . . . . . . . . 577 Cleaning up files transferred to the cloud
Simplified setup: Using SKLM with a certificate storage tier . . . . . . . . . . . . . 674
chain . . . . . . . . . . . . . . . 584 Deleting cloud objects . . . . . . . . . 674
Simplified setup: Valid and invalid Managing reversioned files . . . . . . . . 675
configurations . . . . . . . . . . . . 593 Listing files migrated to the cloud storage tier 676
Simplified setup: Accessing a remote file system 596 Restoring files . . . . . . . . . . . . 676
Simplified setup: Doing other tasks . . . . . 600 Restoring Cloud services configuration . . . . 678
Regular setup: Using SKLM with a self-signed Checking the Cloud services database integrity 678
certificate . . . . . . . . . . . . . . 606 Manual recovery of Transparent cloud tiering
Regular setup: Using SKLM with a certificate database . . . . . . . . . . . . . . 679
chain . . . . . . . . . . . . . . . 614 Scale out backup and restore (SOBAR) for
Configuring encryption with SKLM v2.7 or later 623 Cloud services . . . . . . . . . . . . 679
Configuring encryption with the Vormetric DSM Cloud data sharing . . . . . . . . . . . 692
key server . . . . . . . . . . . . . 626 Listing files exported to the cloud . . . . . 693
| Certificate expiration warnings . . . . . . . 633 Importing cloud objects exported through an
Renewing client and server certificates . . . . . 636 old version of Cloud data sharing . . . . . 696
Certificate expiration errors. . . . . . . . 636 Administering Transparent cloud tiering and Cloud
Renewing expired server certificates . . . . . 637 data sharing services . . . . . . . . . . . 696
Renewing expired client certificates . . . . . 643 Stopping Cloud services software . . . . . 696
| Encryption hints . . . . . . . . . . . . 647 Monitoring the health of Cloud services
Secure deletion . . . . . . . . . . . . . 648 software . . . . . . . . . . . . . . 697
Encryption and standards compliance . . . . . 649 Checking the Cloud services version . . . . 698
Encryption and FIPS-140-2 certification . . . . 650 Known limitations of Cloud services . . . . . 699
Encryption and NIST SP800-131A compliance 650
Encryption in a multicluster environment . . . . 650
Encryption in a Disaster Recovery environment 650
Chapter 40. Managing file audit
Encryption and backup/restore . . . . . . . 651 logging . . . . . . . . . . . . . . 701
Encryption and snapshots . . . . . . . . . 651 Starting consumers in file audit logging . . . . 701
Encryption and a local read-only cache (LROC) Stopping consumers in file audit logging . . . . 701
device . . . . . . . . . . . . . . . . 651 Displaying topics that are registered in the message
| Encryption and external pools . . . . . . . . 652 queue for file audit logging . . . . . . . . 701
Encryption requirements and limitations . . . . 652 Enabling file audit logging on a new spectrumscale
cluster node . . . . . . . . . . . . . . 702
Chapter 37. Managing certificates to | Managing the list of monitored events . . . . . 702
| Designating additional broker nodes for increased
secure communications between GUI | performance . . . . . . . . . . . . . . 703
web server and web browsers . . . . 655
Contents ix
x IBM Spectrum Scale 5.0.2: Administration Guide
Tables
1. IBM Spectrum Scale library information units xiv 34. ACL permissions required to work on files
2. Conventions . . . . . . . . . . . xxiv and directories, while using SMB protocol
| 3. List of changes in documentation xxxiv (table 1 of 2) . . . . . . . . . . . . 324
4. Configuration attributes on the mmchconfig 35. ACL permissions required to work on files
command . . . . . . . . . . . . . 6 and directories, while using SMB protocol
5. Attributes and default values . . . . . . 70 (table 2 of 2) . . . . . . . . . . . . 324
6. Supported Components . . . . . . . . 71 36. ACL permissions required to work on files
7. Configuration parameters at cache and their and directories, while using NFS protocol
default values at the cache cluster . . . . . 93 (table 1 of 2) . . . . . . . . . . . . 325
8. Configuration parameters at cache and their 37. ACL permissions required to work on files
default values at the cache cluster - Valid and directories, while using NFS protocol
values . . . . . . . . . . . . . . 96 (table 2 of 2) . . . . . . . . . . . . 326
9. Configuration parameters at cache for parallel 38. Commands and reference to manage ACL
I/O . . . . . . . . . . . . . . . 97 tasks . . . . . . . . . . . . . . 328
10. Configuration parameters at cache for parallel 39. ACL options that are available to manipulate
I/O - valid values . . . . . . . . . . 98 object read ACLs . . . . . . . . . . 333
11. Configuration parameters at primary and their 40. Summary of commands to set up
default values . . . . . . . . . . . . 99 cross-cluster file system access.. . . . . . 356
12. Configuration parameters at primary and 41. The effects of file operations on an immutable
their default values - Valid values. . . . . 100 file or an appendOnly file . . . . . . . 418
13. Configuration parameters at cache for parallel 42. IAM modes and their effects on file
I/O . . . . . . . . . . . . . . . 101 operations on immutable files . . . . . . 419
14. Configuration parameters at cache for parallel 43. Example for retention period . . . . . . 426
I/O - valid values . . . . . . . . . . 102 44. Example - Time stamp of snapshots that are
15. NFS server parameters . . . . . . . . 104 retained based on the retention policy . . . 426
16. COMPRESSION and illCompressed flags 131 45. Valid EncParamString values . . . . . . 567
17. Set QoS classes to unlimited . . . . . . 135 46. Valid combine parameter string values 567
18. Allocate the available IOPS . . . . . . . 135 47. Valid wrapping parameter string values 567
19. Authentication requirements for each file 48. Required version of IBM Spectrum Scale 572
access protocol. . . . . . . . . . . . 208 49. Remote Key Management servers . . . . . 572
20. Object services and object protocol nodes 247 50. The RKM.conf file . . . . . . . . . . 574
21. Object input behavior in unified_mode 263 51. The client keystore directory . . . . . . 576
22. Configuration options for [swift-constraints] 52. Configuring a node for encryption in the
in swift.conf . . . . . . . . . . . . 279 simplified setup. . . . . . . . . . . 580
23. Configurable options for [DEFAULT] in 53. Configuring a node for encryption in the
object-server-sof.conf . . . . . . . . . 281 simplified setup. . . . . . . . . . . 588
24. Configurable options for [capabilities] in 54. Setup of Cluster1 and Cluster2. . . . . . 596
spectrum-scale-object.conf . . . . . . . 282 55. Managing another key server . . . . . . 601
25. Configuration options for [DEFAULT] in | 56. Frequency of warnings . . . . . . . . 635
spectrum-scale-objectizer.conf . . . . . . 282 57. Comparing default lifetimes of key server
26. Configuration options for and key client certificates . . . . . . . 636
[IBMOBJECTIZER-LOGGER] in 58. Security features that are used to secure
spectrum-scale-objectizer.conf . . . . . . 283 authentication server . . . . . . . . . 657
27. Configuration options for object-server.conf 283 59. Sample policy list . . . . . . . . . . 669
28. Configuration options for 60. Parameter description. . . . . . . . . 690
/etc/sysconfig/memcached . . . . . . . 283 61. Parameter description. . . . . . . . . 690
29. Configuration options for proxy-server.conf 283 62. Parameter description. . . . . . . . . 691
30. mkldap command parameters . . . . . . . 309 63. Parameter description. . . . . . . . . 691
31. Removal of a file with ACL entries DELETE 64. Parameter description. . . . . . . . . 692
and DELETE_CHILD . . . . . . . . . 317 65. IBM Spectrum Scale port usage . . . . . 716
32. Mapping from SMB Security Descriptor to 66. Firewall related information. . . . . . . 718
NFSv4 ACL entry . . . . . . . . . . 321 67. Recommended port numbers that can be used
33. Mapping from NFSv4 ACL entry to SMB for installation . . . . . . . . . . . 719
Security Descriptor. . . . . . . . . . 321 68. Recommended port numbers that can be used
for internal communication . . . . . . . 720
69. Recommended port numbers for NFS access 721
IBM Spectrum Scale is a file management infrastructure, based on IBM® General Parallel File System
(GPFS™) technology, which provides unmatched performance and reliability with scalable access to
critical file data.
To find out which version of IBM Spectrum Scale is running on a particular AIX node, enter:
lslpp -l gpfs\*
To find out which version of IBM Spectrum Scale is running on a particular Linux node, enter:
rpm -qa | grep gpfs (for SLES and Red Hat Enterprise Linux)
dpkg -l | grep gpfs (for Ubuntu Linux)
To find out which version of IBM Spectrum Scale is running on a particular Windows node, open
Programs and Features in the control panel. The IBM Spectrum Scale installed program name includes
the version number.
Which IBM Spectrum Scale information unit provides the information you need?
The IBM Spectrum Scale library consists of the information units listed in Table 1 on page xiv.
To use these information units effectively, you must be familiar with IBM Spectrum Scale and the AIX,
Linux, or Windows operating system, or all of them, depending on which operating systems are in use at
your installation. Where necessary, these information units provide some background information relating
to AIX, Linux, or Windows. However, more commonly they refer to the appropriate operating system
documentation.
Note: Throughout this documentation, the term “Linux” refers to all supported distributions of Linux,
unless otherwise specified.
Planning
v Planning for GPFS
v Planning for protocols
v Planning for Cloud services
v Firewall recommendations
v Considerations for GPFS applications
Configuring
v Configuring the GPFS cluster
v Configuring the CES and protocol
configuration
v Configuring and tuning your system
for GPFS
v Parameters for performance tuning
and optimization
v Ensuring high availability of the GUI
service
v Configuring and tuning your system
for Cloud services
v Configuring file audit logging
v Configuring Active File Management
v Configuring AFM-based DR
v Tuning for Kernel NFS backend on
AFM and AFM DR
Administering
v Performing GPFS administration tasks
v Verifying network operation with the
mmnetverify command
v Managing file systems
v File system format changes between
versions of IBM Spectrum Scale
v Managing disks
v Managing protocol services
v Managing protocol user
authentication
v Managing protocol data exports
v Managing object storage
v Managing GPFS quotas
v Managing GUI users
v Managing GPFS access control lists
v Considerations for GPFS applications
v Accessing a remote GPFS file system
Troubleshooting
v Best practices for troubleshooting
v Understanding the system limitations
v Collecting details of the issues
v Managing deadlocks
v Installation and configuration issues
v Upgrade issues
v Network issues
v File system issues
v Disk issues
v Security issues
v Protocol issues
v Disaster recovery issues
v Performance issues
v GUI issues
v AFM issues
v AFM DR issues
v Transparent cloud tiering issues
v File audit logging issues
| v Troubleshooting watch folder
v Maintenance procedures
v Recovery procedures
v Support for troubleshooting
v References
For the latest support information, see the IBM Spectrum Scale FAQ in IBM Knowledge Center
(www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html).
Note: Users of IBM Spectrum Scale for Windows must be aware that on Windows, UNIX-style file
names need to be converted appropriately. For example, the GPFS cluster configuration data is stored in
the /var/mmfs/gen/mmsdrfs file. On Windows, the UNIX namespace starts under the %SystemDrive%\
cygwin64 directory, so the GPFS cluster configuration data is stored in the C:\cygwin64\var\mmfs\gen\
mmsdrfs file.
Table 2. Conventions
Convention Usage
bold Bold words or characters represent system elements that you must use literally, such as
commands, flags, values, and selected menu options.
Depending on the context, bold typeface sometimes represents path names, directories, or file
names.
Italics are also used for information unit titles, for the first use of a glossary term, and for
general emphasis in text.
<key> Angle brackets (less-than and greater-than) enclose the name of a key on the keyboard. For
example, <Enter> refers to the key on your terminal or workstation that is labeled with the
word Enter.
\ In command examples, a backslash indicates that the command or coding example continues
on the next line. For example:
mkcondition -r IBM.FileSystem -e "PercentTotUsed > 90" \
-E "PercentTotUsed < 85" -m p "FileSystem space used"
{item} Braces enclose a list from which you must choose an item in format and syntax descriptions.
[item] Brackets enclose optional items in format and syntax descriptions.
<Ctrl-x> The notation <Ctrl-x> indicates a control character sequence. For example, <Ctrl-c> means
that you hold down the control key while pressing <c>.
item... Ellipses indicate that you can repeat the preceding item one or more times.
| In synopsis statements, vertical lines separate a list of choices. In other words, a vertical line
means Or.
In the left margin of the document, vertical lines indicate technical changes to the
information.
Note: CLI options that accept a list of option values delimit with a comma and no space between values.
As an example, to display the state on three nodes use mmgetstate -N NodeA,NodeB,NodeC. Exceptions to
this syntax are listed specifically within the command.
Include the publication title and order number, and, if applicable, the specific location of the information
about which you have comments (for example, a page number or a table number).
To contact the IBM Spectrum Scale development organization, send your comments to the following
e-mail address:
| Summary of changes
| for IBM Spectrum Scale version 5.0.2
| as updated, October 2018
| This release of the IBM Spectrum Scale licensed program and the IBM Spectrum Scale library includes the
| following improvements. All improvements are available after an upgrade, unless otherwise specified.
| AFM and AFM DR-related changes
| v Enabled user-defined gateway node assignment to AFM and AFM DR filesets by modifying
| afmHashVersion value to 5 and adding the gateway node as afmGateway. For more information,
| see the topics mmchfileset command and mmcrfileset command in the IBM Spectrum Scale: Command
| and Programming Reference.
| v Added new options to mmafmctl prefetch. For more information, see the topic mmafmctl
| command in the IBM Spectrum Scale: Command and Programming Reference.
| v Read-Only NFS export is supported for AFM RO mode filesets. For more information, see the
| topic Introduction to Active File Management (AFM) in the IBM Spectrum Scale: Concepts, Planning,
| and Installation Guide.
| Authentication-related changes
| The --password, --ks-admin-pwd, and --ks-swift-pwd parameters are removed from mmuserauth
| CLI command. For more information, see the topic mmuserauth command in the IBM Spectrum
| Scale: Command and Programming Reference.
| Big data and analytics changes
| For information on changes in IBM Spectrum Scale Big Data and Analytics support, see Big Data
| and Analytics - summary of changes.
| Cloud services changes
| Cloud services has the following updates:
| v Support for RHEL 7.4 and 7.5 on both Power® and x86 machines.
| v Support for Openstack Swift 2.13, IBM Cloud Object Storage 3.13.4.40, and Swift3 2.13
| File audit logging updates
| File audit logging has the following updates:
| v Listing or viewing the contents of directories within file audit logging enabled file systems will
| produce OPEN and CLOSE events in the audit logs. For more information, see JSON reporting
| issues in file audit logging in the IBM Spectrum Scale: Problem Determination Guide.
| v Added option to enable and disable file audit logging from the IBM Spectrum Scale
| management GUI. You can enable file audit logging at the file system level while creating or
| modifying a file system from the Files > File Systems page. For more information, see Enabling
| and disabling file audit logging using the GUI in the IBM Spectrum Scale: Administration Guide.
| v Support for Linux on Z (RHEL 7.x, Ubuntu 16.04 and Ubuntu 18.04 on s390x).
| v Multi-cluster/remote mount is supported. For more information, see Remotely mounted file
| systems in file audit logging in the IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
| v Improved monitoring of Kafka producers.
| v Subset of events is supported.
Note: In IBM Spectrum Scale V4.1.1 and later, many of these tasks can also be handled by the installation
toolkit configuration options. For more information on the installation toolkit, see Using the spectrumscale
installation toolkit to perform installation tasks: Explanations and examples topic in the IBM Spectrum Scale:
Concepts, Planning, and Installation Guide.
For information on RAID administration, see IBM Spectrum Scale RAID: Administration.
For more information, see mmcrcluster command in IBM Spectrum Scale: Command and Programming
Reference.
For details on how GPFS clusters are created and used, see GPFS cluster creation considerations topic in
IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
For more usage information, see mmlscluster command in IBM Spectrum Scale: Command and Programming
Reference.
If the cluster uses a server-based repository, the command also displays the following information:
v The primary GPFS cluster configuration server
v The secondary GPFS cluster configuration server
You must follow these rules when adding nodes to a GPFS cluster:
v You may issue the command only from a node that already belongs to the GPFS cluster.
v A node may belong to only one GPFS cluster at a time.
v The nodes must be available for the command to be successful. If any of the nodes listed are not
available when the command is issued, a message listing those nodes is displayed. You must correct
the problem on each node and reissue the command to add those nodes.
v After the nodes are added to the cluster, you must use the mmchlicense command to designate
appropriate GPFS licenses to the new nodes.
To add node k164n01.kgn.ibm.com to the GPFS cluster, issue the following command:
You can also use the installation toolkit to add nodes. For more information, see Adding nodes, NSDs, or
file systems to an installation process in IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
For complete usage information, see mmaddnode command, mmlscluster command and mmchlicense command
in IBM Spectrum Scale: Command and Programming Reference.
where nodes_to_delete contains the nodes k164n01 and k164n02. The system displays information
similar to the following:
Verifying GPFS is stopped on all affected nodes ...
mmdelnode: Command successfully completed
mmdelnode: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
3. To confirm the deletion of the nodes, issue the following command:
mmlscluster
For information on deleting protocol nodes (CES nodes) from a cluster, see “Deleting a Cluster Export
Services node from an IBM Spectrum Scale cluster” on page 34.
For complete usage information, see mmdelnode command and mmlscluster command in IBM Spectrum Scale:
Command and Programming Reference.
Exercise caution when shutting down GPFS on quorum nodes or deleting quorum nodes from the GPFS
cluster. If the number of remaining quorum nodes falls below the requirement for a quorum, you will be
unable to perform file system operations. For more information on quorum, see the section on Quorum, in
the IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
After you have configured the GPFS cluster, you can change configuration attributes with the
mmchcluster command or the mmchconfig command. For more information, see the following topics:
v mmchcluster command in IBM Spectrum Scale: Command and Programming Reference
v mmchconfig command in IBM Spectrum Scale: Command and Programming Reference
If you are using the traditional server-based (non-CCR) configuration repository, you can also do the
following tasks:
v Change the primary or secondary GPFS cluster configuration server nodes. The primary or secondary
server may be changed to another node in the GPFS cluster. That node must be available for the
command to be successful.
Attention: If during the change to a new primary or secondary GPFS cluster configuration server, one
or both of the old server nodes are down, it is imperative that you run the mmchcluster -p LATEST
command as soon as the old servers are brought back online. Failure to do so may lead to disruption
in GPFS operations.
v Synchronize the primary GPFS cluster configuration server node. If an invocation of the mmchcluster
command fails, you will be prompted to reissue the command and specify LATEST on the -p option to
synchronize all of the nodes in the GPFS cluster. Synchronization instructs all nodes in the GPFS
cluster to use the most recently specified primary GPFS cluster configuration server.
For example, to change the primary server for the GPFS cluster data, enter:
mmchcluster -p k164n06
Attention: The mmchcluster command, when issued with either the -p or -s option, is designed to
operate in an environment where the current primary and secondary GPFS cluster configuration servers
are not available. As a result, the command can run without obtaining its regular serialization locks. To
assure smooth transition to a new cluster configuration server, no other GPFS commands (mm...
commands) should be running when the command is issued nor should any other command be issued
until the mmchcluster command has successfully completed.
For complete usage information, see mmchcluster command and mmlscluster command in IBM Spectrum
Scale: Command and Programming Reference
Table 4 details the GPFS cluster configuration attributes which can be changed by issuing the
mmchconfig command. Variations under which these changes take effect are noted:
1. Take effect immediately and are permanent (-i).
2. Take effect immediately but do not persist when GPFS is restarted (-I).
3. Require that the GPFS daemon be stopped on all nodes for the change to take effect.
4. May be applied to only a subset of the nodes in the cluster.
For more information on the release history of tuning parameters, see “Tuning parameters change
history” on page 47.
Table 4. Configuration attributes on the mmchconfig command
Attribute name and Description -i option -I option GPFS must List of Change takes
allowed allowed be stopped NodeNames effect
on all allowed
nodes
adminMode yes no no no immediately
Specify the nodes you want to target for change and the attributes with their new values on the
mmchconfig command. For example, to change the pagepool value for each node in the GPFS cluster
immediately, enter:
mmchconfig pagepool=100M -i
For complete usage information, see mmchconfig command in IBM Spectrum Scale: Command and
Programming Reference.
Security mode
The security mode of a cluster determines the level of security that the cluster provides for
communications between nodes in the cluster and also for communications between clusters.
For both the AUTHONLY mode and the cipher mode, the cluster automatically generates a
public/private key pair when the mode is set. However, for communication between clusters, the system
administrators are still responsible for exchanging public keys.
In IBM Spectrum Scale V4.2 or later, the default security mode is AUTHONLY. The mmcrcluster
command sets the mode when it creates the cluster. You can display the security mode by running the
following command:
mmlsconfig cipherlist
You can change the security mode with the following command:
mmchconfig cipherlist=security_mode
If you are changing the security mode from EMPTY to another mode, you can do so without stopping
the GPFS daemon. However, if you are changing the security mode from another mode to EMPTY, you
must stop the GPFS daemon on all the nodes in the cluster. Change the security mode to EMPTY and
then restart the GPFS daemon.
The default security mode is EMPTY in IBM Spectrum Scale V4.1 or earlier and is AUTHONLY in IBM
Spectrum Scale V4.2 or later. If you migrate a cluster from IBM Spectrum Scale V4.1 to V4.2 or later by
Configuring the security mode to a setting other than EMPTY (that is, either AUTHONLY or a supported
cipher) requires the use of the GSKit toolkit for encryption and authentication. As such, the gpfs.gskit
package, which is available on all Editions, should be installed.
Every administration node in the IBM Spectrum Scale cluster must be able to run administration
commands on any other node in the cluster. Each administration node must be able to do so without the
use of a password and without producing any extraneous messages. Also, most of the IBM Spectrum
Scale administration commands must run at the root level. One solution to meet these requirements is to
configure each node to permit general remote login to its root user ID. However, there are secure
solutions available that do not require root-level login.
You can use a sudo program, or a sudo-like framework to enable a user login, which is not at the
root-level. With sudo wrapper, you can launch IBM Spectrum Scale administration commands with a
sudo wrapper script. This script uses ssh to log in to the remote node using a non-root ID, and then uses
sudo on the remote node to run the commands with root-level privileges. The root user on an
administration node still needs to be able to log in to all nodes in the cluster as the non-root ID, without
being prompted for a password.
Note:
v Sudo wrappers are not supported on clusters where one or more of the nodes is running the Windows
operating system.
v Sudo wrappers are not supported with clustered NFS (cNFS).
v Sudo wrappers are not supported with Cluster Export Services (CES).
v Sudo wrappers are not supported with file audit logging.
v The installation toolkit is not supported in a sudo wrapper environment.
v Call home is not supported in a sudo wrapper environment.
Configuring sudo
The system administrator must configure sudo by modifying the sudoers file. IBM Spectrum Scale installs
a sample of the modified sudoers file as /usr/lpp/mmfs/samples/sudoers.sample.
Note: The examples in this section have the user name gpfsadmin and the group gpfs.
# Allow members of the gpfs group to run all commands but only selected commands without a password:
%gpfs ALL=(ALL) PASSWD: ALL, NOPASSWD: /usr/lpp/mmfs/bin/mmremote, /usr/bin/scp, /bin/echo, /usr/lpp/mmfs/bin/mmsdrrestore
The first line preserves the environment variables that the IBM Spectrum Scale administration
commands need to run. The second line allows the users in the gpfs group to run administration
commands without being prompted for a password. The third line disables requiretty. When this
flag is enabled, sudo blocks the commands that do not originate from a TTY session.
3. Perform the following steps to verify that the sshwrap and scpwrap scripts work correctly.
a. sshwrap is an IBM Spectrum Scale sudo wrapper script for the remote shell command that is
installed with IBM Spectrum Scale. To verify that it works correctly, run the following command
as the gpfsadmin user:
sudo /usr/lpp/mmfs/bin/mmcommon test sshwrap nodeName
[sudo] password for gpfsadmin:
mmcommon test sshwrap: Command successfully completed
Note: Here nodeName is the name of an IBM Spectrum Scale node in the cluster.
b. scpwrap is an IBM Spectrum Scale sudo wrapper script for the remote file copy command that is
installed with IBM Spectrum Scale. To verify that it works correctly, run the following command
as the gpfsadmin user:
sudo /usr/lpp/mmfs/bin/mmcommon test scpwrap nodeName
mmcommon test scpwrap: Command successfully completed
Note: Here nodeName is the name of an IBM Spectrum Scale node in the cluster.
Sudo is now configured to run administration commands without remote root login.
Perform the following steps to configure a new cluster or an existing cluster to call the sudo wrapper
scripts:
v To configure a new cluster to call the sudo wrapper scripts, use these steps.
1. Log in with the user ID. This example uses gpfsadmin as the user ID.
2. Issue the mmcrcluster command with the --use-sudo-wrapper option as shown in the following
example:
$ sudo /usr/lpp/mmfs/bin/mmcrcluster --use-sudo-wrapper -N c13c1apv7:quorum,c13c1apv8
mcrcluster: Performing preliminary node verification ...
mmcrcluster: Processing quorum and other critical nodes ...
Make the following configuration changes to use the IBM Spectrum Scale management GUI on a cluster
where sudo wrappers are used:
1. Issue the mmchconfig sudoUser=gpfsadmin command to configure the user name.
2. Issue the systemctl restart gpfsgui command to restart the GUI.
Passwordless ssh is set up between the root user on the node where the GUI is running on all the remote
nodes in the cluster. The ssh calls are equivalent to ssh gpfsadmin@destination-node. Therefore, it is not
necessary to set up passwordless ssh between gpfsadmin users on any two nodes. The root user of the
node where the GUI is running can do passwordless ssh to any other node using the gpfsadmin user
login. So, unidirectional access from the GUI node to the remote nodes as gpfsadmin user is enough.
Note: If sudo wrappers are enabled on the cluster but GUI is not configured for it, the system raises an
event.
To stop using sudo wrappers, run the mmchcluster command with the --nouse-sudo-wrapper option as
shown in the following example:
The cluster stops calling the sudo wrapper scripts to run the remote administration commands.
When sudo wrappers are enabled and a root-level background process calls an administration command
directly rather than through sudo, the administration command typically fails. Examples of such a
root-level process are the cron program and IBM Spectrum Scale callback programs. Such processes call
administration commands directly even when sudo wrappers are enabled.
In the failing scenario, the GPFS daemon that processes the administration command encounters a login
error when it tries to run an internal command on another node as the root user. When sudo wrappers
are enabled, nodes typically do not allow root-level logins by other nodes. (That is the advantage of
having sudo wrappers.) When the root-level login fails, the GPFS daemon that is processing the
administration command cannot complete the command and returns an error.
To avoid this problem, you can set the sudoUser attribute to a non-root admin user ID that can log in to
any node in the cluster without being prompted for a password. You can specify the same admin user ID
that you used to configure sudo. For more information on the admin user ID, see “Configuring sudo” on
page 17.
You can set the sudoUser attribute in the following commands in the IBM Spectrum Scale: Command and
Programming Reference: mmchconfig command (the sudoUser attribute), mmcrcluster command (the
--sudo-user parameter), or mmcrcluster command (the --sudo-user parameter).
For more information on node quorum, see the section on Quorum, in the IBM Spectrum Scale: Concepts,
Planning, and Installation Guide
For more information on node quorum with tiebreaker, see the section on Quorum in the IBM Spectrum
Scale: Concepts, Planning, and Installation Guide
When using node quorum with tiebreaker, define one, two, or three disks to be used as tiebreaker disks
when any quorum node is down. Issue this command:
mmchconfig tiebreakerDisks="nsdName;nsdName;nsdName"
If you are using node quorum with tiebreaker and want to change to using node quorum, issue this
command:
mmchconfig tiebreakerDisks=DEFAULT
For a more detailed discussion on the role of the file system manager node, see Special management
functions in IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
The node that is the file system manager can also be used for applications. In some cases involving very
large clusters or applications that place a high stress on metadata operations, it may be useful to specify
which nodes are used as file system managers. Applications that place a high stress on metadata
operations are usually those that involve large numbers of very small files, or that do very fine-grain
parallel write-sharing among multiple nodes.
You can display the file system manager node by issuing the mmlsmgr command. You can display the
information for an individual file system, a list of file systems, or for all of the file systems in the cluster.
For example, to display the file system manager for the file system fs1, enter:
mmlsmgr fs1
The output shows the device name of the file system and the file system manager's node number and
name:
file system manager node [from 19.134.68.69 (k164n05)]
---------------- ------------------
fs1 19.134.68.70 (k164n06)
For complete usage information, see mmlsmgr command in IBM Spectrum Scale: Command and Programming
Reference.
You can change the file system manager node for an individual file system by issuing the mmchmgr
command. For example, to change the file system manager node for the file system fs1 to k145n32, enter:
mmchmgr fs1 k145n32
The output shows the file system manager's node number and name, in parentheses, as recorded in the
GPFS cluster data:
GPFS: 6027-628 Sending migrate request to current manager node 19.134.68.69 (k145n30).
GPFS: 6027-629 [N] Node 19.134.68.69 (k145n30) resigned as manager for fs1.
GPFS: 6027-630 [N] Node 19.134.68.70 (k145n32) appointed as manager for fs1.
For complete usage information, see mmchmgr command in IBM Spectrum Scale: Command and Programming
Reference.
To determine how long the mmrestripefs command takes to complete, consider these points:
1. The amount of data that potentially needs to be moved. You can estimate this value by issuing the df
command.
2. The number of IBM Spectrum Scale client nodes that are available to do the work.
3. The amount of Network Shared Disk (NSD) server bandwidth that is available for I/O operations.
4. The quality of service for I/O operations (QoS) settings on each node. For more information, see
mmchqos in the IBM Spectrum Scale: Command and Programming Reference.
5. The maximum number of PIT threads on each node. For more information, see the description of the
pitWorkerThreadsPerNode attribute in the topic mmchconfig command in the IBM Spectrum Scale:
Command and Programming Reference.
6. The amount of free space that is available from new disks. If you added new disks, issue the mmdf
command to determine the amount of additional free space that is available.
The restriping of a file system is done by having multiple threads on each node in the cluster work on a
subset of files. If the files are large, multiple nodes can participate in restriping it in parallel. So, the more
GPFS client nodes that are performing work for the restripe operation, the faster the mmrestripefs
command completes. Use the -N parameter to specify the nodes to participate in the restripe operation.
Based on raw I/O rates, you can estimate the length of time for the restripe operation. However, because
of the need to scan metadata, double that value.
Assuming that enough nodes are available to saturate the disk servers and assuming that all the data
must be moved, the time to read and write every block of data is roughly:
2 * fileSystemSize / averageDiskserverDataRate
As an upper bound, because of the need to scan all of the metadata, double this time. If other jobs are
loading the NSD servers heavily, this time might increase even more.
Note: You do not need to stop all other jobs while the mmrestripefs command is running. The CPU load
of the command is minimal on each node and only the files that are being restriped at any moment are
locked to maintain data integrity.
For new GPFS clusters, see Steps to establishing and starting your GPFS cluster in the IBM Spectrum Scale:
Concepts, Planning, and Installation Guide.
For existing GPFS clusters, before starting GPFS, ensure that you have:
1. Verified the installation of all prerequisite software.
2. Compiled the GPL layer, if Linux is being used.
| Tip: You can configure a cluster to rebuild the GPL automatically whenever a new level of the Linux
| kernel is installed or whenever a new level of IBM Spectrum Scale is installed. This feature is
| available only on the Linux operating system. For more information, see the description of the
| autoBuildGPL attribute in the topic mmchconfig command in the IBM Spectrum Scale: Command and
| Programming Reference.
Start the daemons on all of the nodes in the cluster by issuing the mmstartup -a command:
mmstartup -a
Check the messages recorded in /var/adm/ras/mmfs.log.latest on one node for verification. Look for
messages similar to this:
GPFS: 6027-300 [N] mmfsd ready
This indicates that quorum has been formed and this node has successfully joined the cluster, and is now
ready to mount file systems.
If GPFS does not start, see GPFS daemon will not come up in IBM Spectrum Scale: Problem Determination
Guide.
For complete usage information, see mmstartup command in IBM Spectrum Scale: Command and Programming
Reference.
If it becomes necessary to stop GPFS, you can do so from the command line by issuing the mmshutdown
command:
mmshutdown -a
For complete usage information, see mmshutdown command in IBM Spectrum Scale: Command and
Programming Reference.
After performing these steps, depending on your operating system, shut down your servers accordingly.
Before shutting down and powering up your servers, consider the following:
v You must shut down NSD servers before the storage subsystem. While powering up, the storage
subsystem must be online before NSD servers are up so that LUNs are visible to them.
v In a power-on scenario, verify that all network and storage subsystems are fully operational before
bringing up any IBM Spectrum Scale nodes.
v On the Power platform, you must shut down operating systems for LPARs first and then power off
servers using Hardware Management Console (HMC). HMC must be the last to be shut down and the
first to be powered up.
v It is preferable to shut down your Ethernet and InfiniBand switches using the management console
instead of powering them off. In any case, network infrastructure such as switches or extenders must
be powered off last.
v After starting up again, verify that functions such as AFM and policies are operational. You might need
to manually restart some functions.
v There are a number other GPFS functions that could be interrupted by a shutdown. Ensure that you
understand what else might need to be verified depending on your environment.
Some of the CES and protocol configuration steps might have been completed already through the IBM
Spectrum Scale installer. To verify, see the information about the IBM Spectrum Scale installer and
protocol configuration in the topic spectrumscale command in the IBM Spectrum Scale: Command and
Programming Reference guide.
For more information on the CES features, see Chapter 32, “Implementing Cluster Export Services,” on
page 467.
The CES shared root (cesSharedRoot) is needed for storing CES shared configuration data, for protocol
recovery, and for other protocol-specific purposes. It is part of the cluster export configuration and is
shared between the protocols. Every CES node requires access to the path configured as shared root.
The mmchconfig command is used to configure this directory as part of setting up a CES cluster.
The cesSharedRoot cannot be changed while any CES nodes are up and running. You need to bring down
all CES nodes if you want to modify the shared root configuration.
The cesSharedRoot is monitored by the mmsysmonitor. If the shared root is not available, the CES node list
(mmces node list) will show "no-shared-root" and a failover is triggered.
The cesSharedRoot cannot be unmounted when the CES cluster is up and running. You need to bring all
CES nodes down if you want to unmount cesSharedRoot (for example, for doing service action like fsck).
The recommendation for CES shared root is a dedicated file system (but this is not enforced). It can also
be a part (path) of an existing GPFS file system. A dedicated file system can be created with the mmcrfs
command. In any case, CES shared root must reside on GPFS and must be available when it is configured
through mmchconfig.
If not already done through the installer, it is recommended that you create a file system for the CES.
Some protocol services share information through a cluster-wide file system. It is recommended to use a
separate file system for this purpose.
To set up CES, change the configuration to use the new file system:
mmchconfig cesSharedRoot=/gpfs/fs0
Note:
v Once GPFS starts back up, by virtue of the fact that the cesSharedRoot is now defined, then CES can be
enabled on the cluster.
v If file audit logging is already enabled for the file system that you defined for cesSharedRoot, you need
to disable and then enable file audit logging for that file system again.
mmaudit Device disable
mmaudit Device enable
If not already done during the installation, this must be done before configuring any protocols. Nodes
that should participate in the handling of protocol exports need to be configured as CES nodes.
For each of the nodes that should handle protocol exports, run:
mmchnode -N nodename –-ces-enable
After configuring all nodes, verify that the list of CES nodes is complete:
mmces node list
CES nodes may be assigned to CES groups. A CES group is identified by a group name consisting of
lowercase alphanumeric characters. CES groups may be used to manage CES node and address
assignments.
The group assignment may also be specified when the node is enabled for CES by issuing the following
command:
mmchnode --ces-enable --ces-group group1,group2 -N node
The node may be removed from a group at any time by issuing the following command:
mmchnode --noces-group group1 -N node
For more information, see mmchnode command in IBM Spectrum Scale: Command and Programming Reference.
After adding all desired CES protocol service IP addresses, verify the configuration:
mmces address list
Use mmces address add --ces-ip 192.168.6.6 to add an IP address to the CES IP address pool. The IP
address will be assigned to a CES node according to the CES "Address distribution policy".
CES addresses can be assigned to CES groups. A CES group is identified by a group name consisting of
alphanumeric characters which are case-sensitive. Addresses can be assigned to a group when they are
defined by issuing the following command:
mmces address add --ces-ip 192.168.6.6 --ces-group group1
A CES address that is associated with a group must be assigned only to a node that is also associated
with the same group. A node can belong to multiple groups while an address cannot.
As an example, consider a configuration with three nodes. All three nodes can host addresses on subnet
A, and two of the nodes can host addresses on subnet B. The nodes must have existing non-CES IP
address of the same subnet configured on the interfaces intended to be used for the CES IPs. Also four
addresses are defined, two on each subnet.
Node1: groups=subnetA,subnetB
Node2: groups=subnetA,subnetB
Node3: groups=subnetA
Address1: subnetA
Address2: subnetA
Address3: subnetB
Address4: subnetB
In this example, Address1 and Address2 can be assigned to any of the three nodes, but Address3 and
Address4 can be assigned to only Node1 or Node2.
If an address is assigned to a group for which there are no healthy nodes, the address will remain
unassigned until a node in the same group becomes available.
Addresses without a group assignment can be assigned to any node. Therefore, it is necessary to use a
group for each subnet when multiple subnets exist.
For more information, see mmces command in IBM Spectrum Scale: Command and Programming Reference.
Virtual LANs (VLANs) are often associated with secure networks because they provide a means of
separating network devices into independent networks. Although the physical network infrastructure is
shared, unicast, multicast, and broadcast traffic from a network device in a VLAN is restricted to other
devices within that same VLAN.
CES IPs are automatically assigned and aliased to existing network adapters on protocol nodes during
startup. The following example shows aliased CES IPs in a flat network environment or a single VLAN
environment in which the switch ports are set to Access mode and thus, do not need VLAN tagging.
In the preceding example, eth1 preexists with an established route and IP: 10.11.1.122. This is manually
assigned and must be accessible prior to any CES configuration. Once CES services are active, CES IPs
are then automatically aliased to this base adapter, thus creating eth1:0 and eth1:1. The floating CES IPs
assigned to the aliases are 10.11.1.5 and 10.11.1.7. Both CES IPs are allowed to move to other nodes in
case of a failure. This automatic movement combined with the ability to manually move CES IPs, might
cause a variance in the number of aliases and CES IPs among protocol nodes. The data0 interface
illustrates how a network used for GPFS intra-cluster connectivity between nodes can be separate from
the adapter used for CES IPs.
Example distribution of CES IPs among two protocol nodes after enablement of protocols
mmces address list
Address Node Group Attribute
-------------------------------------------------------------------------
10.11.1.5 protocol-node-1 none none
10.11.1.6 protocol-node-2 none object_database_node,object_singleton_node
10.11.1.7 protocol-node-1 none none
10.11.1.8 protocol-node-2 none none
A network switch port can be considered a trunk port if it gives access to multiple VLANs. When this
occurs, it is necessary for a VLAN tag to be added to each frame. This VLAN tag is an identification
allowing switches to contain traffic within specific networks. If multiple networks must access data from
IBM Spectrum Scale protocol nodes, then one possible option is to configure trunk ports on the switch
directly connected to the IBM Spectrum Scale protocol nodes. Once a trunk port exists, VLAN tags are
necessary on the connected network adapters. Note that CES IPs are automatically assigned and aliased
to existing network adapters on protocol nodes during startup. Due to this, the existence of VLAN tags
requires a preexisting network adapter with an established route and IP so that CES IPs can alias to it.
As in the no VLAN tag example, an existing network adapter must be present so that CES\ IPs can alias
to it. Note the non-VLAN base adapter eth1, has no IPs assigned. In this example, the preexisting
network adapter with an established route and IP is eth1.3016. The IP for eth1.3016 is 10.30.16.122 and
the VLAN tag is 3016. This preexisting IP can be used for network verification prior to CES IP
configuration by pinging it from external to the cluster or pinging it from other protocol nodes. It is a
good practice to make sure that all protocol node base adapter IPs are accessible before enabling
protocols. The data0 interface shows how a network used for GPFS intra-cluster connectivity between
nodes can be separate from the adapter used for CES IPs.
Example distribution of CES IPs among two protocol nodes after enablement of protocols (with VLAN
tag)
Example of aliased CES IPs using the ip addr command (with multiple VLAN tag)
eth1: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN
link/ether 00:50:56:83:16:e5 brd ff:ff:ff:ff:ff:ff
valid_lft forever preferred_lft forever
Example distribution of CES IPs from multiple VLANs among two protocol nodes after enablement of
protocols
mmces address list
Address Node Group Attribute
-------------------------------------------------------------------------
10.11.80.54 protocol-node-2 none none
10.11.80.55 protocol-node-1 none none
10.30.16.5 protocol-node-1 none none
10.30.16.6 protocol-node-2 none none
10.30.16.7 protocol-node-1 none none
10.30.16.8 protocol-node-2 none none
10.30.17.100 protocol-node-1 none none
10.30.17.101 protocol-node-2 none none
10.30.17.102 protocol-node-2 none object_database_node,object_singleton_node
10.30.17.103 protocol-node-1 none none
For more information, see the mmces node list and mmces address list options in mmces command in
IBM Spectrum Scale: Command and Programming Reference.
For information on how to configure and enable SMB and NFS services, see “Configuring and enabling
SMB and NFS protocol services” on page 177.
To create a fileset, log on to the IBM Spectrum Scale GUI and select Files > Filesets > Create Fileset.
When the file system is intended for CES export, IBM strongly recommends to configure the file systems
to only allow NFSv4 ACLs through the -k nfs4 option for mmcrfs. When using the default configuration
profiles (/usr/lpp/mmfs/profiles) that are included with IBM Spectrum Scale, the NFSv4 ACL setting is
already set from the profile configuration (see “Authorizing file protocol users” on page 319 for details).
Also if quotas should be used, enable the quota usage during the file system creation.
For information on unified file and object access, see Planning for unified file and object access in IBM
Spectrum Scale: Concepts, Planning, and Installation Guide.
Note: Ensure that all GPFS file systems used to export data via NFS are mounted with the syncnfs
option in order to prevent clients from running into data integrity issues during failover. It is
recommended to use the mmchfs command to set the syncnfs option as default when mounting the GPFS
file system.
For more information on creating protocol data exports, see File system considerations for the NFS protocol
and Fileset considerations for creating protocol data exports in IBM Spectrum Scale: Concepts, Planning, and
Installation Guide.
For detailed information about using the installation toolkit to configure GPFS and protocols, see the
following:
v spectrumscale command in IBM Spectrum Scale: Command and Programming Reference
v Installing IBM Spectrum Scale on Linux nodes and deploying protocols in IBM Spectrum Scale: Concepts,
Planning, and Installation Guide
For more information, see GPFS cluster creation considerations in IBM Spectrum Scale: Concepts, Planning, and
Installation Guide.
Values suggested here reflect evaluations made at the time this documentation was written. For the latest
system configuration and tuning settings, see the IBM Spectrum Scale FAQ in IBM Knowledge Center
(www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html) and the IBM Spectrum Scale
Wiki (www.ibm.com/developerworks/community/wikis/home/wiki/General Parallel File System
(GPFS)).
For more information on using multiple token servers, see “Using multiple token servers” on page 715
For the latest system configuration settings, see the IBM Spectrum Scale FAQ in IBM Knowledge Center
(www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html).
Clock synchronization
The clocks of all nodes in the GPFS cluster must be synchronized. If this is not done, NFS access to the
data and other GPFS file system operations may be disrupted.
This includes:
Cache usage
GPFS creates a number of cache segments on each node in the cluster. The amount of cache is controlled
by three attributes.
These attributes have default values at cluster creation time and may be changed through the
mmchconfig command:
pagepool
The GPFS pagepool attribute is used to cache user data and file system metadata. The pagepool
mechanism allows GPFS to implement read as well as write requests asynchronously. Increasing
the size of the pagepool attribute increases the amount of data or metadata that GPFS can cache
without requiring synchronous I/O. The amount of memory available for GPFS on a particular
node may be restricted by the operating system and other software running on the node.
The optimal size of the pagepool attribute depends on the needs of the application and effective
caching of its re-accessed data. For systems where applications access large files, reuse data,
benefit from GPFS prefetching of data, or have a random I/O pattern, increasing the value for the
pagepool attribute may prove beneficial. However, if the value is set too large, GPFS will start
with the maximum that the system allows. See the GPFS log for the value it is running at.
To change the size of the pagepool attribute to 4 GB:
mmchconfig pagepool=4G
maxFilesToCache
The total number of different files that can be cached at one time. Every entry in the file cache
requires some pageable memory to hold the content of the file's inode plus control data
structures. This is in addition to any of the file's data and indirect blocks that might be cached in
the page pool.
The total amount of memory required for inodes and control data structures can be estimated as:
maxFilesToCache × 3 KB
Valid values of maxFilesToCache range from 1 to 100,000,000. For systems where applications use
a large number of files, of any size, increasing the value for maxFilesToCache may prove
beneficial. This is particularly true for systems where a large number of small files are accessed.
The value should be large enough to handle the number of concurrently open files plus allow
caching of recently used files.
If the user does not specify a value for maxFilesToCache, the default value is 4000.
maxStatCache
This parameter sets aside additional pageable memory to cache attributes of files that are not
currently in the regular file cache. This is useful to improve the performance of both the system
and GPFS stat() calls for applications with a working set that does not fit in the regular file cache.
The total amount of memory GPFS uses to cache file data and metadata is arrived at by adding pagepool
to the amount of memory required to hold inodes and control data structures (maxFilesToCache × 3 KB),
and the memory for the stat cache (maxStatCache × 400 bytes) together. The combined amount of
memory to hold inodes, control data structures, and the stat cache is limited to 50% of the physical
memory on a node running GPFS.
During configuration, you can specify the maxFilesToCache, maxStatCache, and pagepool attributes that
control how much cache is dedicated to GPFS. These values can be changed later, so experiment with
larger values to find the optimum cache size that improves GPFS performance without negatively
affecting other applications.
The mmchconfig command can be used to change the values of maxFilesToCache, maxStatCache, and
pagepool. The pagepool parameter is the only one of these parameters that may be changed while the
GPFS daemon is running. A change to the pagepool attribute occurs immediately when using the -i
option on the mmchconfig command. Changes to the other values are effective only after the daemon is
restarted.
For further information on these cache settings for GPFS, refer to GPFS and memory in IBM Spectrum Scale:
Concepts, Planning, and Installation Guide.
A token allows a node to cache data it has read from disk, because the data cannot be modified
elsewhere without revoking the token first. Each token manager can handle approximately 300,000
different file tokens (this number depends on how many distinct byte-range tokens are used when
multiple nodes access the same file). If you divide the 300,000 by the number of nodes in the GPFS
cluster you get a value that should approximately equal maxFilesToCache (the total number of different
files that can be cached at one time) + maxStatCache (additional pageable memory to cache file attributes
that are not currently in the regular file cache).
Access patterns
GPFS attempts to recognize the pattern of accesses (such as strided sequential access) that an application
makes to an open file. If GPFS recognizes the access pattern, it will optimize its own behavior.
For example, GPFS can recognize sequential reads and will retrieve file blocks before they are required by
the application. However, in some cases GPFS does not recognize the access pattern of the application or
cannot optimize its data transfers. In these situations, you may improve GPFS performance if the
application explicitly discloses aspects of its access pattern to GPFS through the gpfs_fcntl() library call.
GPFS supports using such aggregate interfaces. The main benefit is increased bandwidth. The aggregated
interface has the network bandwidth close to the total bandwidth of all its physical adapters. Another
benefit is improved fault tolerance. If a physical adapter fails, the packets are automatically sent on the
next available adapter without service disruption.
EtherChannel and IEEE802.3ad each requires support within the Ethernet switch. Refer to the product
documentation for your switch to determine if EtherChannel is supported.
For details on how to configure EtherChannel and IEEE 802.3ad Link Aggregation and verify whether the
adapter and the switch are operating with the correct protocols for IEEE 802.3ad, consult the operating
system documentation.
Hint: Make certain that the switch ports are configured for LACP (the default is PAGP).
Hint: A useful command for troubleshooting, where device is the Link Aggregation device, is:
entstat -d device
Swap space
It is important to configure a swap space that is large enough for the needs of the system.
While the actual configuration decisions should be made taking into account the memory requirements of
other applications, it is a good practice to configure at least as much swap space as there is physical
memory on a given node.
For the latest system configuration and tuning settings, see the IBM Spectrum Scale FAQ in IBM
Knowledge Center (www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html) and the
IBM Spectrum Scale Wiki (www.ibm.com/developerworks/community/wikis/home/wiki/General
Parallel File System (GPFS)).
For more configuration and tuning considerations for Linux nodes, see the following topics:
1. “updatedb considerations”
2. “Memory considerations”
3. “GPFS helper threads”
4. “Communications I/O” on page 42
5. “Disk I/O” on page 42
updatedb considerations
On some Linux distributions, the system is configured by default to run the file system indexing utility
updatedb through the cron daemon on a periodic basis (usually daily).
This utility traverses the file hierarchy and generates a large I/O load. For this reason, it is configured by
default to skip certain file system types and nonessential file systems. However, the default configuration
does not prevent updatedb from traversing GPFS file systems. In a cluster this results in multiple
instances of updatedb traversing the same GPFS file system simultaneously. This causes general file
system activity and lock contention in proportion to the number of nodes in the cluster. On smaller
clusters, this may result in a relatively short-lived spike of activity, while on larger clusters, depending on
the overall system throughput capability, the period of heavy load may last longer. Usually the file
system manager node will be the busiest, and GPFS would appear sluggish on all nodes. Re-configuring
the system to either make updatedb skip all GPFS file systems or only index GPFS files on one node in
the cluster is necessary to avoid this problem.
Memory considerations
It is recommended that you adjust the vm.min_free_kbytes kernel tunable. This tunable controls the
amount of free memory that Linux kernel keeps available (that is, not used in any kernel caches).
When vm.min_free_kbytes is set to its default value, on some configurations it is possible to encounter
memory exhaustion symptoms when free memory should in fact be available. Setting
vm.min_free_kbytes to 5-6% of the total amount of physical memory, but no more than 2 GB, can
prevent this problem.
Since systems vary, it is suggested you simulate an expected workload in GPFS and examine available
performance indicators on your system. For instance some SCSI drivers publish statistics in the /proc/scsi
directory. If your disk driver statistics indicate that there are many queued requests it may mean you
should throttle back the helper threads in GPFS.
Communications I/O
Values suggested here reflect evaluations made at the time this documentation was written. For the latest
system configuration and tuning settings, see the IBM Spectrum Scale Wiki (www.ibm.com/
developerworks/community/wikis/home/wiki/General Parallel File System (GPFS)).
To optimize the performance of GPFS and your network, it is suggested you do the following:
v Enable Jumbo Frames if your switch supports it.
If GPFS is configured to operate over Gigabit Ethernet, set the MTU size for the communication
adapter to 9000.
v Verify /proc/sys/net/ipv4/tcp_window_scaling is enabled. It should be by default.
v Tune the TCP window settings by adding these lines to the /etc/sysctl.conf file:
# increase Linux TCP buffer limits
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608
# increase default and maximum Linux TCP buffer sizes
net.ipv4.tcp_rmem = 4096 262144 8388608
net.ipv4.tcp_wmem = 4096 262144 8388608
After these changes are made to the /etc/sysctl.conf file, apply the changes to your system:
1. Issue the sysctl -p /etc/sysctl.conf command to set the kernel settings.
2. Issue the mmshutdown -a command and then issue mmstartup -a command to restart GPFS
Disk I/O
To optimize disk I/O performance, you should consider the following options for NSD servers or other
GPFS nodes that are directly attached to a SAN over a Fibre Channel (FC) network.
1. The storage server cache settings can impact GPFS performance if not set correctly.
2. When the storage server disks are configured for RAID5, some configuration settings can affect GPFS
performance. These settings include:
v GPFS block size
v Maximum I/O size of the Fibre Channel host bus adapter (HBA) device driver
v Storage server RAID5 stripe size
Note: For optimal performance, GPFS block size should be a multiple of the maximum I/O size of
the FC HBA device driver. In addition, the maximum I/O size of the FC HBA device driver should be
a multiple of the RAID5 stripe size.
3. These suggestions may avoid the performance penalty of read-modify-write at the storage server for
GPFS writes. Examples of the suggested settings are:
v 8+P RAID5
– GPFS block size = 512K
– Storage Server RAID5 segment size = 64K (RAID5 stripe size=512K)
– Maximum IO size of FC HBA device driver = 512K
v 4+P RAID5
– GPFS block size = 256K
– Storage Server RAID5 segment size = 64K (RAID5 stripe size = 256K)
– Maximum IO size of FC HBA device driver = 256K
For the example settings using 8+P and 4+P RAID5, the RAID5 parity can be calculated from the data
written and will avoid reading from disk to calculate the RAID5 parity. The maximum IO size of the
Note: These changes through the mmchconfig command take effect upon restart of the GPFS daemon.
v The number of AIX AIO kprocs to create should be approximately the same as the GPFS
worker1Threads setting.
v The AIX AIO maxservers setting is the number of kprocs PER CPU. It is suggested to set is slightly
larger than the value of worker1Threads divided by the number of CPUs. For example if
worker1Threads is set to 500 on a 32-way SMP, set maxservers to 20.
v Set the Oracle database block size equal to the LUN segment size or a multiple of the LUN pdisk
segment size.
v Set the Oracle read-ahead value to prefetch one or two full GPFS blocks. For example, if your GPFS
block size is 512 KB, set the Oracle blocks to either 32 or 64 16 KB blocks.
v Do not use the dio option on the mount command as this forces DIO when accessing all files. Oracle
automatically uses DIO to open database files on GPFS.
v When running Oracle RAC 10g, it is suggested you increase the value for
OPROCD_DEFAULT_MARGIN to at least 500 to avoid possible random reboots of nodes.
In the control script for the Oracle CSS daemon, located in /etc/init.cssd the value for
OPROCD_DEFAULT_MARGIN is set to 500 (milliseconds) on all UNIX derivatives except for AIX.
For AIX this value is set to 100. From a GPFS perspective, even 500 milliseconds maybe too low in
situations where node failover may take up to a minute or two to resolve. However, if during node
Important: Set autoload to no before fixing hardware issues and performing system maintenance.
deadlockDetectionThreshold
When deadlockDetectionThreshold is set to 0, the GPFS dead-lock detection feature is disabled. The
default value of this parameters is 300 seconds.
Important: You must enable the GPFS dead-lock detection feature to collect debug data and resolve
dead lock issue in a cluster. If dead-lock events occur frequently, fix the problem instead of disabling
the feature.
defaultHelperNodes
The nodes that are added to defaultHelperNodes are used in running certain commands, such as
mmrestripefs. Running the GPFS command on partial nodes in a cluster, such as running the
mmrestripefs command on all NSD server nodes, might have a better performance. The default value
of this parameter is all nodes in cluster.
Important: Set the –N option for GPFS management commands or change the value of
defaultHelperNodes before running the GPFS management commands.
maxFilesToCache
The maxFilesToCache parameter specifies the number of files that can be cached by each node. The
range of valid values for maxFilesToCache is 1 - 100,000,000. The default value is 4000. The value of
this parameter must be large enough to handle the number of concurrently open files and to allow
the caching of recently used files.
Changing the value of maxFilesToCache affects the amount of memory that is used on the node. In a
large cluster, a change in the value of maxFilesToCache is greatly magnified. Increasing
maxFilesToCache in a large cluster with hundreds of nodes increases the number of tokens a token
manager needs to store. Ensure that the manager node has enough memory and tokenMemLimit is
increased when you are running GPFS version 4.1.1 and earlier. Therefore, increasing the value of
maxFilesToCache on large clusters usually happens on a subset of nodes that are used as log-in nodes,
SMB and NFS exporters, email servers, and other file servers.
For systems on which applications use a large number of files, increasing the value of
maxFilesToCache might be beneficial, especially where a large number of small files are accessed.
Note: Setting the maxFilesToCache parameter to a high value results in a large amount of memory
being allocated for internal data buffering. If the value of maxFilesToCache is set too high, some
operations in IBM Spectrum Scale might not have enough memory to run in. If you have set
maxFilesToCache to a very high value and you see error messages in the mmfs.log file that say that
not enough memory is available to perform an operation, try lowering the value of maxFilesToCache.
maxBlockSize
The value of maxBlockSize must be equal to or larger than the maximum block size of all the file
Note: When you migrate a cluster from an earlier version to version 5.0.0 or later, the value of
maxblocksize stays the same. However, if maxblocksize was set to DEFAULT in the earlier version
of the cluster, then migrating it to version 5.0.0 or later sets it explicitly to 1 MiB, which was the
default size in earlier versions. To change maxBlockSize to the default size after migrating to version
5.0.0 or later, set maxblocksize=DEFAULT (4 MiB).
For more information, see the topics mmcrfs and mmchconfig in the IBM Spectrum Scale: Command and
Programming Reference.
maxMBpS
The maxMBpS parameter indicates the maximum throughput in megabytes per second that GPFS can
submit into or out of a single node. GPFS calculates from this variable how many
prefetch/writebehind threads to schedule for sequential file access.
In GPFS version 3.5 and earlier, the default value is 2048. But if the node has faster interconnect, such
as InfiniBand or 40GigE or multiple links) you can set the parameter to a higher value. As a general
rule, try setting maxMBpS to twice the I/O throughput that the node can support. For example, if the
node has 1 x FDR link and the GPFS configuration parameter verbRdma has been enabled, then the
expected throughput of the node is 6000 MB/s. In this case, set maxMBpS to 12000.
Setting maxMBpS does not guarantee the desired GPFS sequential bandwidth on the node. All the
layers of the GPFS stack, including the node, the network, and the storage subsystem, must be
designed and tuned to meet the I/O performance requirements.
maxStatCache
The maxStatCache parameter sets aside the pageable memory to cache attributes of files that are not
currently in the regular file cache. This improves the performance of stat() calls for applications with
a working set that does not fit in the regular file cache. For systems where applications test the
existence of files, or the properties of files, without actually opening them as backup applications do,
increasing the value for maxStatCache can be beneficial.
| For information about the default values of maxFilesToCache and maxStatCache, see the description of
| the maxStatCache attribute in the topic mmchconfig command in the IBM Spectrum Scale: Command and
| Programming Reference.
| In versions of IBM Spectrum Scale earlier than 5.0.2, the stat cache is not effective on the Linux
| platform unless the Local Read-Only Cache (LROC) is configured. For more information, see the
| description of the maxStatCache parameter in the topic For more information, see the topic mmchconfig
| command in the IBM Spectrum Scale: Command and Programming Reference.
nsdMaxWorkerThreads
NSD server tuning. For more information about GPFS NSD server design and tuning, see NSD Server
Design and Tuning.
pagepool
The pagepool parameter is used to change the size of the data cache on each node. The default value
is either one-third of the physical memory of the node or 1G, whichever is smaller. This value applies
to new installations only. On upgrades, the existing default value is maintained.
The maximum GPFS pagepool size depends on the value of the pagepoolMaxPhysMemPct parameter
and the amount of physical memory on the node. Unlike local file systems that use the operating
system page cache to cache file data, GPFS allocates its own cache called the page pool. The GPFS
page pool is used to cache user file data and file system metadata. Along with file data, the page
pool supplies memory for various types of buffers such as prefetch and write behind. The default
page pool size might be sufficient for sequential IO workloads. The default page pool size might not
be sufficient for Random IO or workloads that involve a large number of small files.
Important: While deploying FPO or when the HAWC feature is enabled, set the
restripeOnDiskFailure parameter to yes.
tiebreakerDisks
For a small cluster with up to eight nodes that have SAN-attached disk systems, define all nodes as
quorum nodes and use tiebreaker disks. With more than eight nodes, use only node quorum. While
defining the tiebreaker disks, you can use the SAN-attached NSD in the file system. The default value
of this parameter is null, which means no tiebreaker disk has been defined.
unmountOnDiskFail
The unmountOnDiskFail specifies how the GPFS daemon responds when a disk failure is detected. The
valid values of this parameter are yes, no, and meta. The default value is no.
Important: Set the value of unmountOnDiskFail to meta for FPO deployment or when the file system
metadata and data replica are more than one.
workerThreads
The workerThreads parameter controls an integrated group of variables that tune the file system
performance in environments that are capable of high sequential and random read and write
workloads and small file activity.
The default value of this parameter is 48 for a base IBM Spectrum Scale cluster and 512 for a cluster
with protocols installed. A valid value can be any number ranging from 1 to 8192. The -N flag is
valid with this variable. This variable controls both internal and external variables. The internal
variables include maximum settings for concurrent file operations, for concurrent threads that flush
dirty data and metadata, and for concurrent threads that prefetch data and metadata. You can adjust
the following external variables with the mmchconfig command:
v logBufferCount
v preFetchThreads
v worker3Threads
For more information on the configuration attributes, see “Changing the GPFS cluster configuration data”
on page 4.
The following figure illustrates the GUI high availability configuration with two GUI nodes.
bl1ins103
The following list provides the configuration requirements to ensure high availability of the GUI service:
v Up to three GUI nodes can be configured in a cluster. Perform the following steps to set up a GUI
node:
– Install the GUI package on the node. For more information on the latest packages that are required
for different platforms, see IBM Spectrum Scale: Concepts, Planning, and Installation Guide
– Start the GUI service and either log in or run /usr/lpp/mmfs/gui/cli/initgui to initialize the GUI
database. Now, the GUI becomes fully functional and it adds the node to the GUI_MGMT_SERVERS
node class.
v The GUI nodes are configured in the active/active configuration. All GUI nodes are fully functional
and can be used in parallel.
v Each GUI has its own local configuration cache in PostgreSQL and collects configuration changes
individually.
v One GUI node is elected as the master node. This GUI instance exclusively performs some tasks that
must be run only once in a cluster such as, running snapshot schedules and sending email and SNMP
notifications. If services that are run on the master GUI node are configured, the environment for all
the GUI nodes must support these services on all nodes. For example, it needs to be ensured that
access to SMTP and SNMP servers is possible from all GUI nodes and not only from the master GUI
node. You can use the following utility function, which displays the current master GUI node:
[root@gpfsgui-11 ~]# /usr/lpp/mmfs/gui/cli/lsnode
Hostname IP Description Role Product Connection GPFS Last updated
version status status
Before you begin, ensure that you install the server package RPMs on all nodes that you want to
designate as Cloud services nodes. These nodes must have GPFS server licenses enabled.
Also, ensure that a user-defined node class is created and properly configured for Cloud services. For
instructions, see Creating a user-defined node class for Transparent cloud tieringorCloud data sharing in IBM
Spectrum Scale: Concepts, Planning, and Installation Guide
To start working with Cloud services, the administrator first needs to designate a node as Cloud services
node in the IBM Spectrum Scale cluster. Data migration to or data recall from a cloud object storage
occurs in this node.
You can designate a maximum of any combination of 4 CES or NSD nodes as Cloud services nodes in
each node class (with a maximum of four node class for 16 nodes total) in the IBM Spectrum Scale
cluster.
If you use multiple node classes for Cloud services, then you can designate at least one node in each
node class as Cloud services server nodes.
By default and by way of recommendation, Cloud services use the node IP addresses, not the CES IPs.
Note: You need to perform this procedure only on a single node where the server package is installed.
1. To designate the nodes as Cloud services nodes, issue a command according to this syntax: mmchnode
change-options -N {Node[,Node...] | NodeFile | NodeClass} [--cloud-gateway-nodeclass
CloudGatewayNodeClass].
You can either choose to designate all nodes or only some selected nodes in a node class as Cloud
services nodes.
To designate all nodes in the node class, TCTNodeClass1, as Cloud services server nodes, issue this
command:
mmchnode --cloud-gateway-enable -N TCTNodeClass1
To designate only a few nodes (node1 and node2) in the node class, TCTNodeClass1, as Cloud services
server nodes, issue this command:
mmchnode --cloud-gateway-enable -N node1,node2 --cloud-gateway-nodeclass TCTNodeClass1
It designates only node1 and node2 as Cloud services server nodes from the node class,
TCTNodeClass1. Administrators can continue to use the node class for other purposes.
Note: The Cloud services node must have connectivity to the object storage service that the Cloud
services uses.
2. To designate nodes from multiple node classes as Cloud services server nodes, issue the following
commands:
v mmchnode --cloud-gateway-enable -N TCTNodeClass1
Note: These nodes cannot be combined into a single Cloud services across node classes because
Cloud services for nodes in different node classes are always different or separate.
3. To list the designated Transparent cloud tiering nodes, issue this command: mmcloudgateway node
list
You can add a node to the node class at any time. For example, issue the following commands to add the
node, 10.11.12.13, to the node class, TCTNodeClass1.
1. mmchnodeclass TCTNodeClass1 add -N 10.11.12.13
2. mmchnode --cloud-gateway-enable -N 10.11.12.13 --cloud-gateway-nodeclass TCTNodeClass1
Before you try to start Cloud services on a node, ensure that the node is designated as a Cloud services
node. For more information, see “Designating the Cloud services nodes” on page 55.
Start the Cloud services before you run any of the Cloud services commands.
For example, to start the service on all Transparent cloud tiering nodes in a cluster, issue this command:
mmcloudgateway service start -N alltct
To start the service on all Cloud services nodes as provided in the node class, TCTNodeClass1, issue this
command:
mmcloudgateway service start -N TCTNodeClass1
If you provide this command without any arguments, the service is started on the current node.
If you have more than one node class, then you must start the Cloud services individually on each node
class, as follows:
v mmcloudgateway service start -N TCTNodeClass1
v mmcloudgateway service start -N TCTNodeClass2
It is a good practice to verify that the service is started. Enter a command like the following one:
mmcloudgateway service status -N TCTNodeClass
Note: You can run this command from any node in the cluster, not necessarily from a node that is part of
a node class.
Note:
v Before you try to configure a cloud storage account, ensure that the Cloud services are started. For
more information, see “Starting up the Cloud services software” on page 56.
v Before deleting a cloud storage account, ensure that you recall all the data that is migrated to the
cloud.
v Ensure that Network Time Protocol (NTP) is enabled and time is correctly set.
Note: Even though you specify the credentials for the cloud account, the actual validation does not
happen here. The authentication of the credentials happens only when you create a cloud storage access
point. Therefore, you do not receive any authentication error even if you provide some wrong cloud
account credentials.
Next step: See “Defining cloud storage access points (CSAP)” on page 60.
| Amazon S3
| Account creation for Amazon S3
| Note:
| v The following regions are supported for Amazon:
| – us-standard
| – us-west-1
| – us-west-2
| – eu-west-1
| – eu-central-1
| – sa-east-1
| – ap-southeast-1
| – ap-southeast-2
| – ap-south-1
| – ap-northeast-1
| – ap-northeast-2
| v For Amazon, data will be read twice from the file system (once for Sigv4 signature creation and
| another during putBlob operation). Therefore, you might experience some degradation in data
| migration performance with Amazon in comparison to other cloud providers.
| Swift3 account
| Account creation for Swift3
| Note: While using nginx as a load balancer with IBM Cloud Object Storage, ensure that invalid-headers
| and etag attributes are turned off for Transparent cloud tiering to work correctly. Without these settings,
| any Transparent cloud tiering request to IBM Cloud Object Storage would fail with errors that indicate
| signature mismatch.
| Openstack Swift
| Configuring a cloud object storage account for Openstack Swift.
| Note:
| v In case of Swift Dynamic Large Objects, ensure that this configuration is included in the Swift
| “required_filters” section, as follows:
| /usr/lib/python2.7/site-packages/swift/proxy/server.py
|
| required_filters = [
| {’name’: ’catch_errors’},
| {’name’: ’gatekeeper’,
| ’after_fn’: lambda pipe: ([’catch_errors’]
| if pipe.startswith(’catch_errors’)
| else [])},
| {’name’: ’dlo’, ’after_fn’: lambda _junk: [
| ’copy’, ’staticweb’, ’tempauth’, ’keystoneauth’,
| ’catch_errors’, ’gatekeeper’, ’proxy_logging’]},
| {’name’: ’versioned_writes’, ’after_fn’: lambda _junk: [
| ’slo’, ’dlo’, ’copy’, ’staticweb’, ’tempauth’,
| ’keystoneauth’, ’catch_errors’, ’gatekeeper’, ’proxy_logging’]},
| # Put copy before dlo, slo and versioned_writes
| {’name’: ’copy’, ’after_fn’: lambda _junk: [
| ’staticweb’, ’tempauth’, ’keystoneauth’,
| ’catch_errors’, ’gatekeeper’, ’proxy_logging’]}]
| v For the delete functionality to work in Swift, verify that the Bulk middleware to be in pipeline, as
| follows:
| vim /etc/swift/proxy-server.conf
| [pipeline:main] pipeline = healthcheck cache bulk authtoken keystone proxy-server
You can send data to the cloud object storage via Cloud Storage Access Points (CSAPs). Each cloud
account needs at least one CSAP defined in order to have a path to the cloud. For some cases (IBM
SoftLayer®, Amazon S3, or cloud storage with a load balancer with built-in redundancy) one accessor
suffices. However, for cases where traffic is going directly to the object storage, it is usually beneficial to
have more than one CSAP to provide needed availability and bandwidth for performance. For example, if
you are designing an on-premise solution with IBM Cloud Object Storage, you will want to create one
access point for each accessor node you want to send data to. The Cloud services will randomly assign
work to the available accessors as long as they are performing properly (broken or slow access points are
avoided).
Note: If multiple intermediate certificates are issued by an internal certifying authority (CA), ensure to
provide only a self-signed internal CA rather than providing a file that contains all the intermediate
certificates. For example, if the CA issued a certificate chain such as Internal CA->cert1->cert2, then the
input pem file should contain only the Internal CA certificate.
Note: In proxy-based environments, set your proxy settings as part of the node class configuration before
you run any migrations. If tiering commands (migrate/recall) are run before you set the proxy details,
they might fail for not being able to reach out to the public cloud storage providers such as Amazon S3.
For more information, see the mmcloudgateway command in the IBM Spectrum Scale: Command and
Programming Reference.
For data movement (sharing or tiering) commands, you must specify a Cloud service name to move data
to the intended object storage if there is more than one Cloud service that is configured to the file system
or file set. However, you do not have to specify a Cloud service name for these data movement
commands if only one Cloud service is configured for a file system or file set.
Additionally, if you want to execute both tiering and sharing operations in your Cloud services setup,
you must define one Cloud service for tiering and another for sharing.
v To create a Cloud service according to the cloud account that is created, issue a command similar to
the following one:
mmcloudgateway cloudService create
--cloud-nodeclass TCTNodeClass1 --cloud-service-name mycloud
--cloud-service-type Tiering --account-name Cleversafe_cloud
Note: You can use this Cloud service only for tiering. If you want to use it for sharing, you can replace
Tiering with Sharing.
v To update Cloud services, issue a command according to the following:
mmcloudgateway cloudService update --cloud-nodeclass cloud --cloud-service-name newServ --disable
Before you configure Cloud services with IBM Security Key Lifecycle Manager, ensure that an SKLM
server is installed. For more information, see the Preparation for encryption topic in the IBM Spectrum Scale:
Administration Guide.
Note:
v Transparent cloud tiering only supports IBM Security Key Lifecycle Manager versions 2.6.0 and 2.7.0.
v Transparent cloud tiering cannot communicate with IBM Security Key Lifecycle Manager server that
does not support TLSv1.2.
You can create a key manager if you want to use this parameter while you configure a container pair set
in the next topic.
v To create an SKLM key manager, issue a command similar to the following command:
mmcloudgateway keymanager create --cloud-nodeclass cloud
--key-manager-name vm1
--key-manager-type RKM
--sklm-hostname vm1
--sklm-port 9080
--sklm-adminuser SKLMAdmin
--sklm-groupname tct
Next step: “Binding your file system or fileset to the Cloud service by creating a container pair set.”
Note: The local key manager is simpler to configure and use. It might be your best option unless you are
already using SKLM in your IBM Spectrum Scale cluster or in cases where you have special security
requirements that require SKLM.
Cloud services internally creates two containers on cloud storage for storing data as well as meta-data.
However, some cloud providers require containers to be created using its native interfaces. In that case,
you need to provide the names. The containers that are created for Cloud data sharing can be shared
with other file systems or Cloud services. However, the containers that are created for tiering cannot be
shared. Creating the container pair set is how you bind a file system to a Cloud service. Note that all file
sets being bound to a file system must be assigned to the same Cloud services node class.
Note: You must create a new container when the existing container has approximately 100,000,000 (100
million) files. If you create a new container for the same path upon exceeding the 100 million limit, the
previous container goes to the "Inactive" state. It allows new migrations from that point onwards to go to
the newly created container. In turn, creation of a new container only affects target container for new
migrations, whereas recalls are unaffected.
You must decide whether or not to add encryption to the container at the time of creating it, and
accordingly use the KeyManagerName parameter. If you create a container pair without a key manager,
you will not be able to add encryption to the container at a later point by using the mmcloudgateway
ContainerPairSet update command.
Note: In this release, containers do not have encryption enabled. Hence, administrators need to explicitly
add "--enc ENABLE" parameter while creating a container pair set, to ensure that the data is encrypted
while being tiered to a cloud storage.
If you have applications that frequently access the front end of the file, you might want to consider
enabling thumbnail support. An example of an application that accesses the first few bytes of a file is
Windows Explorer in order to provide a thumbnail view of image files. There is no limit on the amount
of data you can cache as the appropriate cache size is application specific. You can create a container pair
set with thumbnail enabled, and the scope can be enabled to either a file system or a fileset according to
your business requirements.
Note:
v Changing the mount point for a file system or the junction path of a fileset after it is associated with a
container is not supported.
v You can enable or disable transparent recall policy by using the --transparent-recalls
{ENABLE|DISABLE} parameter. However, this parameter is optional, and transparent recall policy is
enabled by default even if you do not use this parameter.
Note: If you do not specify the names for storing data and metadata containers, then the container
pairset name is used for both data and metadata containers. In this example, they are "newContainer"
(for data) and "newContainer.meta" (for metadata).
v To create a container pairset with thumbnail enabled and the scope is a file system, issue a command
similar to this:
mmcloudgateway containerpairset create --cloud-nodeclass cloud --container-pair-set-name x13
--cloud-service-name newServ --scope-to-filesystem --path /gpfs --thumbnail-size 64
v To create a container pairset when the scope is a fileset, issue a command similar to this:
mmcloudgateway containerpairset create --cloud-nodeclass cloud --container-pair-set-name x13
--cloud-service-name newServ --scope-to-fileset --path /gpfs/myfileset
v To create a container pair set that is enabled for encryption, issue a command similar to this:
mmcloudgateway containerpairset create --cloud-nodeclass tct --container-pair-set-name
Containeretag5 --cloud-service-name csss5 --path /gpfs0/fs3 --enc ENABLE --etag ENABLE
--data-container test5 --meta-container testmeta5 --key-manager-name lkm3 --scope-to-fileset
--path /gpfs/myfileset
v To configure a container pair set using an immutable fileset with a fileset scope, issue a command
similar to this:
mmcloudgateway containerpairset create --cloud-nodeclass tct --container-pair-set-name wormcp
--cloud-service-name wormservice2 --path /gpfs0/worm2 --enc ENABLE --etag ENABLE --data-container
wormtestnov --meta-container wormtestnovmeta --key-manager-name lkm3 --scope-to-fileset
--path /gpfs/myfileset --cloud-directory-path /gpfs0/fs3
Note: Here, the fileset is an immutable fileset whereas the cloud directory is pointing to a fileset that is
not immutable.
v To test a container pair set that is created, issue a command similar to this:
mmcloudgateway containerpairset test --cloud-nodeclass cloud --container-pair-set-name vmip51
Note: This test will check whether or not the container pair set does actually exist. Additionally, the
test will try to add some data to the container (PUT blob), retrieve the data (GET blob), delete the data
(DELETE blob), and report the status of each of these operations. This test will validate whether or not
all CSAPs for a given container pair set are able to reach the cloud storage.
v To delete a container pair set, issue a command similar to this:
mmcloudgateway containerpairset delete --container-pair-set-name x13
--cloud-nodeclass cloud
v To list a container pair set, issue a command similar to this:
mmcloudgateway containerpairset list
Note: Now that you have created your container configuration, it is critical that you do the following:
v Back up your configuration and security information for disaster recovery purposes. For more
information, see “Scale out backup and restore (SOBAR) for Cloud services” on page 679.
v A review of background container pair set maintenance activities is highly recommended. For more
information, see the Planning for maintenance activities topic in the IBM Spectrum Scale: Concepts,
Planning, and Installation Guide.
For more information, see the mmcloudgateway command in the IBM Spectrum Scale: Command and
Programming Reference.
To back up the Transparent cloud tiering database, issue a command according to the following syntax:
mmcloudgateway files backupDB --container-pair-set-name <coontainerpairsetname>
For example, to back up the database that is associated with the container, cpair1, issue this command:
mmcloudgateway files backupDB --container-pair-set-name cpair1
If the database size is large, then backing up operation can be a long running process.
Note: By using the backed-up database, you can perform a database recovery by using the
mmcloudgateway files rebuildDB command. For more information, see “Manual recovery of Transparent
cloud tiering database” on page 679.
To back up the configuration data, issue a command according to the following syntax:
mmcloudgateway service backupConfig --backup-file <name of the file including the path>
Note: Files that are collected as part of backup are settings file for each node group.
You can specify a path along with the file name. If the path does not exist, then the command creates the
path. The backed-up files are stored in a tar file, which is saved under the specified folder. If the path is
not specified, then the tar file is stored in the local directory.
Note: It is a best practice to save the backup file in a safe location outside the cluster to ensure that the
backup file can be retrieved even if the cluster goes down. For example, when you use encryption no
copy of the key library is made to cloud storage by Transparent cloud tiering. Therefore, if there is a
disaster in which a cluster is destroyed, you must make sure that the key library is safely stored on a
remote cluster. Otherwise, you cannot restore the Transparent cloud tiering service for files that are
encrypted on the cloud because the key to decrypt the data in the cloud is no longer available.
A good way to back up the Cloud services configuration is as a part of the SOBAR based backup and
restore script that is included. For more information, see “Scale out backup and restore (SOBAR) for
Cloud services” on page 679.
The regular maintenance tasks are backing up the cloud database, reconciling files between the file
system and the object storage, and deleting cloud objects that are marked for deletion. These activities are
automatically done by Cloud services according to the default schedules. However, if the default
schedules do not suit your requirements, you can modify the schedules and create your own maintenance
windows by using the mmcloudgateway maintenance command.
Before setting up a maintenance window, review the guidelines provided here: Planning for maintenance
activities topic in the IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
Summary:
===================
Total Containers : 1
Total Overdue : 0
Total In Progress : 0
Reconcile Overdue : 0
Backup Overdue : 0
Retention Overdue : 0
===================
Container: producer-container
==========================
Status : Active
In Progress : no
File Count : 5
Files Deleted (last run) : 0
maintenanceName : defaultWeekly
type : weekly
startTime : 6:01:00
endTime : 1:01:00
enabled : true
maintenanceName : main2
type : weekly
startTime : 1:07:00
endTime : 2:07:00
enabled : true
taskFrequencyName : default
backupFrequency : weekly
reconcileFrequency : monthly
deleteFrequency : daily
v To update the maintenance schedule, issue a command as follows:
mmcloudgateway maintenance update --cloud-nodeclass cloud --maintenance-name dd --daily 08:00-09:00
v To delete a maintenance schedule, issue a command as follows:
mmcloudgateway maintenance delete --cloud-nodeclass 99 --maintenance-name main1
v To disable the maintenance schedule, issue a command as follows:
mmcloudgateway maintenance setState --cloud-nodeclass cloud --maintenance-name main2 --state disable
When you check the output of the mmcloudgateway maintenance list command, you can see the status
of the enabled field as false.
Note: Disabling maintenance activities permanently is not recommended nor is it a supported mode of
operation.
v To set the frequency of a specific maintenance operation, issue a command as follows:
mmcloudgateway maintenance setFrequency --cloud-nodeclass cloud --task reconcile --frequency daily
By default, all operations (reconcile, backup, and delete) are done according to the default frequency
when a maintenance task is run. You can use the setFrequency option to modify the default frequency of
a specific operation. For example, the default frequency for the reconcile operation is monthly, but you
can change the frequency of the reconcile operation to weekly.
When a daily, weekly, or monthly frequency is specified for an operation, what it really means is that
the operation will be executed no more often than its specified frequency. So, for example, an operation
with a daily frequency will run no more often than once per day.
After you create a cloud storage account, you will want to create a policy for exporting data to cloud
storage. A sample policy is provided below. You can run this policy manually as needed or set it up to
run periodically – a cron job is frequently employed for this purpose. This policy is meant to run weekly
and exports files greater than one day old and less than eight days old.
Note: Do not set this down to modified age of 0 if you want to run it in a cron job -- as it will only pick
up a partial list of files for day 0 and you may end up with get gaps or duplicates over your
week-to-week policy runs.
define(
modified_age,
(DAYS(CURRENT_TIMESTAMP) - DAYS(MODIFICATION_TIME))
)
Cloud services use some default configuration parameters. You can change the value of the parameters if
the default settings do not suit your requirements.
The following table provides the list of configurable parameters and their description:
Table 5. Attributes and default values
Default Maximum
Variable Name Value Minimum Value Value Description
connector. server.timeout 5000 (ms) 1000 (ms) 15000 (ms) This is the maximum amount of time the
server takes to respond to the client request.
If the request is not fulfilled, it closes the
connection.
connector. server.backlog 0 0 100 The maximum queue length for incoming
connection indications (a request to connect)
is set to the backlog parameter. If a
connection indication arrives when the
queue is full, the connection is refused.
destroy.sql.batchsize 8196 8196 81960 Page size per delete local database objects
operation.
destroy.cloud.batchsize 256 256 81960 Page size per cloud objects delete operation.
reconcile.sql.batchsize 8196 8196 81960 Reconcile processes files in batches. This
parameter controls how many files are
processed in a batch.
commands. 360(s) 60(s) 3600(s) Maximum time to acquire the lock on
reconcile.lockwait.timeout.sec directory for the reconcile operation.
threads.gc.batchsize 4096 4096 40960 Page size of the Garbage Collector thread.
We can increase this in case the memory
usage is more.
migration. downgrade. 64 (MB) 1 (MB) 64 (MB) Sets the size threshold on files for which the
lock.threshold.size.mb lock downgrade is completed. To save time,
a lock downgrade is not completed on
shorter files that can transfer quickly. For
larger files, a lock downgrade is suggested
because migration might take a long time.
cloud-retention-period-days 30 0 2147483647 Number of days for which the migrated data
needs to be retained on the cloud after its
file system object has been deleted or
reversioned.
where,
v <--cloud-nodeclass> is the node class configured for the Cloud services nodes
v <Attribute> is the value of the attribute that is provided.
Performance monitoring collector (pmcollector) and sensor (pmsensors) services are installed under
/opt/IBM/zimon.
Note: You must install the sensors on all the cloud service nodes, but you need to install the collectors on
any one of the GPFS nodes. For installation instructions, see Manually installing the Performance Monitoring
tool topic in the IBM Spectrum Scale: Administration Guide.
For more information on each of these methods, see Configuring the performance monitoring tool in IBM
Spectrum Scale: Problem Determination Guide.
GPFS-based configuration
This topic describes the procedure for integrating cloud service metrics with the performance monitoring
tool by using GPFS-based configuration.
1. On the Cloud services nodes, copy the following files from /opt/ibm/MCStore/config folder to
/opt/IBM/zimon folder:
v TCTDebugDbStats
v TCTDebugLweDestroyStats
v TCTFsetGpfsConnectorStats
v TCTFsetIcstoreStats
v TCTFsGpfsConnectorStats
v TCTFsIcstoreStats
2. Register the sensor in the GPFS configuration by storing the following snippet in the
MCStore-sensor-definition.cfg file:
sensors=
{
# Transparent cloud tiering statistics
name = "TCTDebugDbStats"
period = 10
type = "Generic"
},
{
#Transparent cloud tiering statistics
name = "TCTDebugLweDestroyStats"
period = 10
type = "Generic"
},
{
#Transparent cloud tiering statistics
name = "TCTFsetGpfsConnectorStats"
period = 10
type = "Generic"
},
{
#Transparent cloud tiering statistics
name = "TCTFsetIcstoreStats"
period = 10
type = "Generic"
},
{
#Transparent cloud tiering statistics
name = "TCTFsGpfsConnectorStats"
period = 10
type = "Generic"
},
{
#Transparent cloud tiering statistics
Note: The sensor definition file can list multiple sensors separated by commas (,).
For more information on GPFS-based configuration, see the topic mmperfmon command in the IBM
Spectrum Scale: Command and Programming Reference guide.
File-based configuration
This topic describes how to configure Cloud services with the performance monitoring tool by using
file-based (manual) configurations.
Note: You must delete the sensors that are used in the earlier releases. If the scope of your Cloud
services configuration is file system, then you do not need to configure the sensor files that start with
TCTFset*. Similarly, if the scope of your Cloud services configuration is fileset, then you do not need to
configure the sensor files that start with TCTFs*.
To integrate the performance monitoring tool with Cloud services server nodes, do the following steps:
1. Copy /opt/IBM/zimon/defaults/ZIMonSensors.cfg to /opt/IBM/zimon. This configuration file
determines which sensors are active and their properties.
Note: If the collectors are already configured at /opt/IBM/zimon/ZIMonSensors.cfg, use the same
collectors.
3. Edit the /opt/IBM/zimon/ZIMonSensors.cfg file to append the following sensors at the end of the
sensor configuration section:
sensors=
{
# Transparent cloud tiering statistics
name = "TCTDebugDbStats"
period = 10
type = "Generic"
},
{
#Transparent cloud tiering statistics
name = "TCTDebugLweDestroyStats"
period = 10
type = "Generic"
},
{
#Transparent cloud tiering statistics
name = "TCTFsetGpfsConnectorStats"
period = 10
type = "Generic"
}
{
#Transparent cloud tiering statistics
name = "TCTFsetIcstoreStats"
period = 10
type = "Generic"
{
#Transparent cloud tiering statistics
name = "TCTFsGpfsConnectorStats"
period = 10
type = "Generic"
},
{
#Transparent cloud tiering statistics
name = "TCTFsIcstoreStats"
period = 10
type = "Generic"
}
Note: Each sensor should be separated by a comma. The period is the frequency in seconds at which
the performance monitoring tool polls the cloud service for statistics. The period is set to 10 seconds
but it is a configurable value. The sensor is turned off when the period is set to 0.
4. Copy the following files from /opt/ibm/MCStore/config folder to /opt/IBM/zimon folder.
v TCTFsGpfsConnectorStats.cfg
v TCTFsIcstoreStats.cfg
v TCTFsetGpfsConnectorStats.cfg
v TCTFsetIcstoreStats.cfg
v TCTDebugLweDestroyStats.cfg
v TCTDebugDbStats.cfg
5. Restart the sensors by using this command: service pmsensors restart .
6. Restart the collectors by using these commands: service pmcollector restart
Note:
If the collector is already installed and is running, then only pmsensors service needs to be restarted.
If you are installing both pmcollectors and pmsensors, then both services need to be restarted.
Remotely mounted
bl1adm096
file system
Transparent recall enabled
For example, to set the GPFS variable on local cluster for remote cluster tctvm1.pk.slabs.ibm.com,
issue the following command:
mmccr vput tctvm1.pk.slabs.ibm.com 192.0.2.0
b. For multiple remote cluster mounted, set the variable, as follows:
mmccr vput tct<remote_cluster1_name> <IP_address>
mmccr vput tct<remote_cluster2_name> <IP_address>
For example, to set the GPFS variable on local cluster for two remote clusters,
tctvm1.pk.slabs.ibm.com and tctvm2.pk.slabs.ibm.com, issue the following command:
mmccr vput tctvm1.pk.slabs.ibm.com 198.51.100.1
mmccr vput tctvm2.pk.slabs.ibm.com 198.51.100.2
4. Copy the mcstore script from the local cluster to the remote cluster by issuing the following
command:
IBM Spectrum Scale provides the immutability feature where you can associate a retention time with
files, and any change or deletion of file data is prevented during the retention time. You can configure a
IBM Spectrum Scale fileset with an integrated Archive Manager (IAM) mode by using the mmchfileset
command. Files stored in such an immutable fileset can be set to immutable or append-only by using
standard POSIX or IBM Spectrum Scale commands. For more information on immutability features
available in IBM Spectrum Scale, see “Immutability and appendOnly features” on page 417.
After immutability feature is configured in IBM Spectrum Scale, you can ensure that files that are stored
on the Object Storage are immutable by leveraging the locked vault feature available in IBM Cloud Object
Storage.
Locked vaults enable storage vaults to be created and registered under the exclusive control of an
external gateway application. IBM Cloud Object Storage stores objects received from the gateway
application. The gateway authenticates to the IBM Cloud Object Storage Manager exclusively by using an
RSA private key and certificate that was configured to create a locked vault and registered only with the
gateway. After that, the normal S3 APIs can be used against the Accesser® nodes by using the configured
private key and certificate. Accesser API key and secret key for S3 API cannot be used for authentication
or authorization. If a key is compromised, the gateway rotates keys by calling the Rotate Client Key
Manager REST API. This API replaces the existing key and revokes the old certificates. A locked vault
with data cannot be deleted by the IBM Cloud Object Storage Administrator, and its ACLs cannot be
changed. Additionally, it cannot be renamed or have proxy setting enabled. For more information about
locked vaults, see IBM Cloud Object Storage System Locked Vault Guide.
Note: To configure WORM feature at the fileset level, it is recommended to match the immutable filesets
with immutable container pair sets on the cloud.
For more information, see the Immutability and appendOnly features in IBM Spectrum Scale:
Administration Guide.
Note: You can perform these procedures either manually or by using the scripts available in the package.
Once the private key is obtained, it can be used to create the locked vaults on the IBM Cloud Object
Storage system. Additionally, for HTTPS (TLS), the CA certificate of the IBM Cloud Object Storage system
is also required.
The two locked vaults required for Transparent cloud tiering (data and metadata vaults) need to be
created on the IBM Cloud Object Storage by using the create vault from template REST API. Once these
vaults are created, they can be specified on the mmcloudgateway filesystem create command via the
–container-prefix option.
This topic describes the procedure for setting up a private key and a private certificate for deploying
WORM solutions by using IBM Cloud Object Storage.
The first step involves creating a certificate signing request (CSR) and registering the client certificate
with IBM Cloud Object Storage Manager via Client Registration REST API and obtaining a private key
(RSA based) signed with IBM Cloud Object Storage Manager Certificate Authority. Once the signed
private certificate is obtained, we can use the RSA private key and private certificate for creating the
locked vaults on the IBM Cloud Object Storage system. Additionally, for HTTPS (TLS) communication,
the root CA certificate of the IBM Cloud Object Storage system is also required.
Note:
v A private account must be created before an automation script or procedure is run.
v A private account must be created each time an incorrect IBM Cloud Object Storage CA certificate is
specified while generating the Keystore.
1. Create a directory that will hold private key and certificates by issuing this command:
$mkdir mydomain2.com.ssl/
2. To generate a keystore that will store the private key, CSR, and certificates, issue the following
command in the /opt/ibm/MCStore/jre/bin directory:
keytool -genkey -alias mydomain2 -keyalg RSA -keysize 2048 -keystore mydomain2.jks
Note: You should make a note of the alias name as it has to be used in the later steps.
3. Generate CSR by issuing the following command:
keytool -certreq -alias mydomain2 -keyalg RSA -file mydomain2.csr -keystore mydomain2.jks
MIICzjCCAbYCAQAwWTELMAkGA1UEBhMCSU4xCzAJBgNVBAgTAktBMRIwEAYDVQQHEwlCYW5nYWxv
cmUxDDAKBgNVBAoTA1NEUzENMAsGA1UECxMESVNETDEMMAoGA1UEAxMDSUJNMIIBIjANBgkqhkiG
9w0BAQEFAAOCAQ8AMIIBCgKCAQEApfVgjnp9vBwGA6Y/g54DBr1wWtWeSAwm680M42O1PUuRwV92
9UDBK9XEkY2Zb+o08Hvspd5VMU97bV7cnN8Fi8WuujHCdgAVuezTT0ZCHjVHl2L6CYql7hmWIazk
TOaROoYlhzZCgQrDyVNIw6XuvkWo3eUIRyi1r6nafUFiqUtMEerEhEYa6cmm5qpeb2GKYJdeN53W
SF0yrUCi9gRgPJiAq6lVSl+wWekbI6lwIAtJVyojx93lRl/KdxfFmh/sriUx//a6+I0OBli6EmEV
BsHeG2HccS1diJ4+eUetXvfkYMjO6kRvYraSVKX022a4Jqki8iYDNf4XvRzOz5YbLQIDAQABoDAw
LgYJKoZIhvcNAQkOMSEwHzAdBgNVHQ4EFgQUrgpT7F8Z+bA9qDxqU8PDg70zFj4wDQYJKoZIhvcN
AQELBQADggEBADW4xuxBaaH9/ZBLOll0tXveSHF8Q4oZo2MhSWf34Shu/ZxC17H8NqCCMyxqVdXI
6kbdg1se5WLCq/JJA7TBcgCyJJqVjADt+RC+TGNc0NlsC7XpeRYLJtxqlKilsWnKJf5oRvA1Vg5P
nkTjCE9XvUzhJ/tTQjNBJSh8nN7Tbu/q5mTIGG9imARPro2xQpvwiFMHrq/f1uNeZ3SeuLxwQtkK
4zge7XwyY63lrKsN0z2a4CPNzU0q68TGL1aE93QDpJYusSeTB0m2om4iTSNgsQKRmYqGDSXM3no/
90UeTAgHjhJ82bGEOfP9FVm+6FnYydr1Endg1aEizC+sArk4e8E=
-----END NEW CERTIFICATE REQUEST-----’ -v
Note: You can set up a private key and a private certificate by using this script
mcstore_lockedvaultpreconfig.sh available at /opt/ibm/MCStore/scripts, as follows:
Setting up a private key and private certificate by using the automation script
Transparent Cloud Tiering Server RPM already installed. Proceeding with Configuration...
Two locked vaults are to be created for deploying WORM solutions by using IBM Spectrum Scale.
IBM Cloud Object Storage Manager enables administrators to create vaults, which are under the exclusive
control of a given external application (Transparent cloud tiering). This allows the application to have full
control over the vault, but does not allow a user or administrator to bypass the application and directly
Chapter 6. Configuring and tuning your system for Cloud services 81
access the vault. Users are allowed to create WORM-style vaults that enforce read or write restrictions on
the objects in the vault, which an administrator cannot bypass.
The two locked vaults required for Transparent cloud tiering (data and metadata vaults) need to be
created on the IBM Cloud Object Storage by using Create vault from the template REST API. Once these
vaults are created, they can be specified on the mmcloudgateway filesystem create command via the
–container-prefix option.
Note: You can create a locked vault by using the mcstore_createlockedvault.sh script available at
/opt/ibm/MCStore/scripts.
1. Convert the JKS keystore to the PKCS12 format by issuing this command:
keytool -importkeystore -srckeystore mydomain2.jks -destkeystore new-store.p12 -deststoretype PKCS12
2. Extract the private key and convert it to an RSA key by issuing the following commands:
v openssl pkcs12 -in "<keystore_directory>"/newkeystore.p12 -nocerts
-out "<keystore_directory>"/privateKey.pem -passin pass:<keystore_password>
-passout pass:<keystore_password>
v openssl rsa -in "<keystore_directory>"/privateKey.pem -out "<keystore_directory>"/rsaprivateKey.pem
-passin pass:<keystore_password>
3. By using the private key and certificate, create a locked vault (one for data and one for metadata) by
issuing the following commands:
v For data vault:
curl --key ./ privateKeynew.pem --cert <certificate-file> -k -v
’https://9.114.98.187/manager/api/json/1.0/createVaultFromTemplate.adm’
-d ’id=1&name=demolockedvault&description=newlockedvaultdescription’
v For metadata vault:
curl --key ./ privateKeynew.pem --cert <certificate-file> -k -v
’https://9.114.98.187/manager/api/json/1.0/createVaultFromTemplate.adm’
-d ’id=1&name=demolockedvault.meta&description=newlockedvaultmetadescription’
Note: To find the provisioning template IDs, on the IBM Cloud Object Storage Manager GUI, click
Template Management, hover the mouse over the template that is listed under Vault Template, and
find the number that is displayed on the footer.
4. Print the locked vaults by issuing this command:
curl --key privateKeynew.pem --cert <certificate-file> -k ’<COS Accesser IP Address>’
Note: The names of the locked vaults must be noted down, and they must be specified to the
mmcloudgateway filesystem create command by using the --container-prefix option.
Creating a locked vault by using automation scripts
a. Go to /opt/ibm/MCStore/scripts and run mcstore_createlockedvault.sh <keystorealiasname>
<keyStorePath> <lockeddatavaultname> <lockeddatavaultDescription> <lockedmetavaultname>
<lockedmetavaultDescription> <COSManagerIP> <dataVaultTemplateID> <metaVaultTemplateID>,
where all parameters are mandatory.
b. For description of the parameters, see the mmcloudgateway command.
For example, mcstore_createlockedvault.sh test /root/svt/test.ssl/test.jks
demodatacontainer test demometacontainer metacontainer 9.10.0.10 1 1.
The system displays output similar to this:
Enter KeyStore Password:
Validating the inputs and the configuration....
COS Manager is reachable. Proceeding with Configuration...
Transparent Cloud Tiering Server RPM already installed. Proceeding with Configuration...
openssl libraries are already installed. Proceeding with Configuration...
Be sure to keep a backup copy of the source keystore that you used to import the private key and
certificates. The mmcloudgateway account delete command removes the private key and certificates from
the trust store.
1. Get the client certificate for IBM Cloud Object Storage Accesser.
2. Create a cloud storage account by using the mmcloudgateway account create command. For more
information, see “Managing a cloud storage account” on page 57.
3. Create a cloud storage access point (CSAP) by using the mmcloudgateway containerPairSet create
command. For more information, see “Binding your file system or fileset to the Cloud service by
creating a container pair set” on page 63.
4. Create a cloud service by using the mmcloudgateway cloudservice create command. For more
information, see “Creating Cloud services” on page 61.
5. Configure Cloud services with SKLM by using the mmcloudgateway keyManager create command. For
more information, see “Configuring Cloud services with SKLM (optional)” on page 62.
6. Create a container pair set by using the mmcloudgateway containerpairset create command. For
more information, see “Binding your file system or fileset to the Cloud service by creating a container
pair set” on page 63.
7. Perform migrate and recall operations by using commands or policies.
Note: Before you perform this procedure, ensure that no active migration is currently taking place. After
you perform this procedure, the old keys will not work.
1. Generate a new CSR using a new alias:
keytool -certreq -alias mydomainnew -keyalg RSA -file mydomainnew.csr -keystore mydomain2.jks
2. Get the CSR signed by sending it to the IBM Cloud Object Storage Manager:
curl --cacert {path to ca certificate} --key {path to RSA private key}
--cert {path to old certificate}
’https://<COS Manager IP>/manager/api/json/1.0/rotateClientKey.adm’
-d 'expirationDate=1508869800000’ --data-urlencode ’csr=
-----BEGIN NEW CERTIFICATE REQUEST-----
MIICzjCCAbYCAQAwWTELMAkGA1UEBhMCSU4xCzAJBgNVBAgTAktBMRIwEAYDVQQHEwlCYW5nYWxv
cmUxDDAKBgNVBAoTA1NEUzENMAsGA1UECxMESVNETDEMMAoGA1UEAxMDSUJNMIIBIjANBgkqhkiG
9w0BAQEFAAOCAQ8AMIIBCgKCAQEApfVgjnp9vBwGA6Y/g54DBr1wWtWeSAwm680M42O1PUuRwV92
9UDBK9XEkY2Zb+o08Hvspd5VMU97bV7cnN8Fi8WuujHCdgAVuezTT0ZCHjVHl2L6CYql7hmWIazk
TOaROoYlhzZCgQrDyVNIw6XuvkWo3eUIRyi1r6nafUFiqUtMEerEhEYa6cmm5qpeb2GKYJdeN53W
SF0yrUCi9gRgPJiAq6lVSl+wWekbI6lwIAtJVyojx93lRl/KdxfFmh/sriUx//a6+I0OBli6EmEV
BsHeG2HccS1diJ4+eUetXvfkYMjO6kRvYraSVKX022a4Jqki8iYDNf4XvRzOz5YbLQIDAQABoDAw
LgYJKoZIhvcNAQkOMSEwHzAdBgNVHQ4EFgQUrgpT7F8Z+bA9qDxqU8PDg70zFj4wDQYJKoZIhvcN
AQELBQADggEBADW4xuxBaaH9/ZBLOll0tXveSHF8Q4oZo2MhSWf34Shu/ZxC17H8NqCCMyxqVdXI
After rotating the client key, use the new certificate and private key to create locked vaults. On
Transparent cloud tiering, update the cloud account by using the mmcloudgateway account update
command.
Rotating Client key or revoking old certificate by using the automation script
a. Run mcstore_lockedvaultrotateclientKey.sh <keystorenewaliasname> <keystoreoldaliasname>
<keyStorePath> <COSManagerIP> <expirationDays> <COSCACertFile>, where the first 4
parameters are mandatory and the last two parameters (<expirationDays> and <COSCACertFile>)
are optional.
If the expiration date (expirationDays) is not specified, then the command will take the default
expiration time, which is 365 days.
If the IBM Cloud Object Storage CA certificate (COSCACertFile) is not specified, then the CA file
will be downloaded from the IBM Cloud Object Storage Manager.
b. For the description of the parameters, see the mmcloudgateway command.
For example, run this command:
./mcstore_lockedvaultrotateclientkey.sh testnew5 test /root/svt/test.ssl/test.jks 9.10.0.10
Updating Transparent cloud tiering with a new private key and certificate
This topic describes how to update Transparent cloud tiering with a new key and certificate.
1. Update the cloud account with the new private key and the certificate by issuing the following
command:
mmcloudgateway account update --cloud-nodeclass tct --account-name mycloud
--src-keystore-path /root/mydomain.jks --src-keystore-alias-name mydomainnew --src-keystore-type jks
--src-keystore-pwd-file /root/pwd
2. For more information, see the mmcloudgateway account update command.
For example, to update the cloud account (node class tct, cloud name mycloud) with new key and
certificate, issue the following command:
mmcloudgateway account update --cloud-nodeclass tct --cloud-name mycloud --src-keystore-path
/root/demold/worm\*.ssl/xyz%n\*.jks --src-keystore-alias-name wormnew --src-keystore-type jks
--src-keystore-pwd-file /root/pwd
Note: Ensure that you have a backup of the Source Key Store used to import the private key and
certificates. Transparent Cloud Tiering removes the private key and certificate from the trust
Note: If the kafkaBrokerServers node class already exists, then the -N specified list will not be used.
Issue a command similar to the following example:
mmmsgqueue enable { -N NodeName[,NodeName...] | NodeFile | NodeClass
For more information, see the mmmsgqueue command in IBM Spectrum Scale: Command and Programming
Reference.
2. To enable a file system for file audit logging, issue the mmaudit command. If the message queue has
not been previously enabled, the first invocation of mmaudit will also enable the message queue for
the entire cluster.
mmaudit Device enable
For more information, see the mmaudit command in IBM Spectrum Scale: Command and Programming
Reference.
Note: If "object" is enabled on the file system that is holding the file audit log fileset, ensure that you
have additional inodes defined for the file audit log fileset prior to enabling.
To disable file audit logging on a file system, issue the mmaudit command.
mmaudit Device disable
To disable the message queue, you must also disable all of the file systems. Issue the mmmsgqueue
command.
mmmsgqueue disable
Actions taken when enabling the message queue and file audit logging
This topic explains the steps that occur when the user enables the message queue and file audit logging.
The message queue must be enabled and functioning before file audit logging can be enabled on a file
system. If you use the installation toolkit to enable file audit logging, the toolkit will attempt to enable
the message queue using protocol nodes as the primary message queue nodes. However, if protocol
nodes are not available or if you want to use a different set of nodes as the primary message queue
nodes, then you can use the mmmsgqueue command with the -N option and either specify a list of nodes,
the path to a file containing a list of nodes, or a node class.
Enabling the message queue
The following list contains the steps that occur when you enable the message queue using the
mmmsgqueue command with the -N option:
Note: If any of the following steps fail, the message queue will not be enabled.
1. Sets up the kafkaZookeeperServers node class using the available Linux quorum nodes that
are found locally within the cluster.
Note: If there are not at least three of these nodes with the special message queue RPMs
installed on them that meet the minimum requirements given in
<link_to_Linux_quorum_requirement>, this step will fail and the message queue will not be
enabled.
2. Sets up the kafkaBrokerServers node class using nodes that are specified with the -N
parameter.
Note: If there are not at least three nodes in the resulting list, this step will fail and the
message queue will not be enabled.
3. Modifies the default message queue server (broker) properties file and uploads the modified
properties file to the CCR. Options within the properties file that are changed from the
default are the list of ZooKeeper nodes, the port to use for listening for connections, and
options for maximum size of the local disk space to use and where to store the local queue
data.
4. Modifies the default ZooKeeper properties file and uploads the modified properties file to
the CCR. Properties that are modified include the port for clients to use, the maximum
number of client connections, and the default ZooKeeper data directory (among other
options).
5. Sets up the random passwords for the brokers, consumers, and producers, and uploads the
authentication configuration to the CCR.
6. Obtains the list of ZooKeeper nodes. For each ZooKeeper node, it enables the ZooKeeper on
that node. Enabling the ZooKeeper node involves the following actions that take place
directly on the node:
a. Downloads the template ZooKeeper properties file from the CCR and places it in a local
directory where it will be read whenever the ZooKeeper is started.
b. Creates the unique "myid" file for this particular ZooKeeper in a local directory.
c. Starts the Kafka ZooKeeper process by referencing the local ZooKeeper properties file.
7. Obtains the list of message queue (broker) nodes. For each message queue node, it enables
the broker on that node. Enabling the broker involves the following actions that take place
directly on the node:
a. Downloads the template broker properties file from the CCR and places it in a local
directory where it will be read whenever the broker is started.
Note:
v Like with the message queue, if any of the following steps fail, then the entire enablement of
file audit logging for the given file system fails.
v However, unlike the message queue, any updates to the configuration or filesets will be rolled
back so that the cluster is not left in a state where a file system is partially enabled for file
audit logging.
1. Verifies that the message queue is successfully configured and enabled.
2. If the consumer node class does not exist, it creates it based off of the Kafka broker node
class. In addition, the consumer authentication configuration information is pushed to all of
the nodes in the newly created consumer node class.
3. For each message queue server (broker) node in the cluster, verifies that the minimum
amount of required local disk space is available to enable file audit logging for a file system
device. Depending on the number of message queue server (broker) nodes in the cluster, this
check might take some time because each node has to be queried.
4. For each consumer node in the consumer node class, verifies that the sink file system is
mounted on the node. The sink file system is the file system where the file audit logging
fileset is located that contains the audit log files.
5. Updates the audit configuration with the file audit logging configuration information
including the file system that will be audited, the device of the sink fileset, and the sink
fileset name among other attributes.
6. Adds the topic to the message queue representing the file system that will be audited.
The File Audit column in the file systems table displays which file systems are file audit logging enabled.
The File Audit column is hidden by default. To see whether file audit logging is enabled, perform the
following steps:
1. Go to Files > File Systems in the management GUI.
2. Select Customize Columns from the Actions menu.
3. Select File Audit. The File Audit column is visible now.
| Perform the following steps to enable file audit logging for a remotely mounted file system:
| 1. Make sure that both the accessing and owning clusters are on IBM Spectrum Scale 5.0.2 minimum
| release level.
| 2. Make sure that the file systems that are going to be remotely mounted are at IBM Spectrum Scale
| 5.0.2 or higher.
| 3. Follow the instructions in Accessing a remote GPFS file in the IBM Spectrum Scale: Administration Guide.
| 4. Validate that the accessing cluster has the required packages installed by referring to Requirements and
| limitations of file audit logging in the IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
| 5. If not already enabled, enable file audit logging in the owning cluster by following the instructions in
| Chapter 7, “Configuring file audit logging,” on page 87.
| 6. At this point, file audit logging should be logging all file system activity from the accessing cluster
| nodes that have fulfilled the previous steps.
| 7. The file audit logs will be owned and located on the owning cluster. Run mmaudit <device> list on
| the owning cluster for details.
| Note: The file audit logging producers on the accessing cluster will log debug messages to the local
| /var/adm/ras/mmaudit.log file. But the overall file audit logging status and logging can only be obtained
| from the owning cluster. For more information, see File audit logging issues in the IBM Spectrum Scale:
| Problem Determination Guide.
|
Table 8. Configuration parameters at cache and their default values at the cache cluster - Valid values
Tunable at the Tunable at the
AFM configuration parameter Valid values Default value cluster level fileset level
afmAsyncDelay 1 - 2147483647 15 Yes Yes
afmDirLookupRefreshInterval 0 - 2147483647 60 Yes Yes
afmDirOpenRefreshInterval 0 - 2147483647 60 Yes Yes
afmDisconnectTimeout 0 - 2147483647, 60 Yes No
disable
afmEnableNFSSec Yes | No No Yes No
afmExpirationTimeout 0 - 2147483647, disabled Yes Yes
disable
afmFileLookupRefreshInterval 0 - 2147483647 30 Yes Yes
afmFileOpenRefreshInterval 0 - 2147483647 30 Yes Yes
afmEnableAutoEviction Yes | No Yes No Yes
afmPrefetchThreshold 0 - 100 0 No Yes
afmShowHomeSnapshot Yes | No No Yes Yes
Table 10. Configuration parameters at cache for parallel I/O - valid values
Tunable at the Tunable at the fileset
AFM configuration parameter Valid values Default value cluster level level
afmNumFlushThreads 1 - 1024 4 No Yes
afmNumReadThreads 1 - 64 1 Yes Yes
afmNumWriteThreads 1 - 64 1 Yes Yes
afmParallelReadChunkSize 0 - 2147483647 128 Yes Yes
afmParallelReadThreshold 0 - 2147483647 1024 Yes Yes
afmParallelWriteChunkSize 0 - 2147483647 128 Yes Yes
afmParallelWriteThreshold 0 - 2147483647 1024 Yes Yes
afmHardMemThreshold 5G Yes No
Table 12. Configuration parameters at primary and their default values - Valid values
Tunable at the Tunable at the
AFM configuration parameter Valid values Default value cluster level fileset level
afmAsyncDelay 1 - 2147483647 15 Yes Yes
afmDisconnectTimeout 0 - 2147483647, 60 Yes No
disable
afmEnableNFSSec Yes | No No Yes No
| afmHashVersion 1 | 2 | 4| 5 2 Yes No
The parameters in the following table can be used at the cache cluster for tuning parallel I/O:
Table 13. Configuration parameters at cache for parallel I/O
Mode on
AFM configuration parameter Unit Description AFM DR
afmNumFlushThreads Whole number Defines the number of threads used on each primary
gateway to synchronize updates to the home
cluster. The default value is 4, which is
sufficient for most installations. The current
maximum value is 1024, which is too high for
most installations. Do not set this parameter
to an extreme value.
afmNumWriteThreads Whole number Defines the number of threads used on each primary
participating gateway node during a parallel
write. The default value of this parameter is 1.
That is, one writer thread is active on every
gateway node for each big write operation
qualifying for splitting as per the parallel
write threshold value.
afmParallelWriteChunkSize MB Defines the minimum chunk size of the write primary
that needs to be distributed among the
gateway nodes during parallel writes. Values
are in bytes.
afmParallelWriteThreshold MB Defines the threshold beyond which parallel primary
writes become effective. Writes are split into
chunks when file size exceeds this threshold
value. Values are in MB. The default value is
1024 MB.
Table 14. Configuration parameters at cache for parallel I/O - valid values
Tunable at the Tunable at the fileset
AFM configuration parameter Valid values Default value cluster level level
afmNumFlushThreads 1 - 1024 4 No Yes
afmNumWriteThreads 1 - 64 1 Yes Yes
afmParallelWriteChunkSize 0 - 2147483647 128 Yes Yes
afmParallelWriteThreshold 0 - 2147483647 1024 Yes Yes
afmHardMemThreshold 5G Yes No
Most of these tuning parameters require at least the AFM client to be restarted. Ensure that the NFS
server is not mounted. Unlink the AFM fileset, or stop GPFS or IBM Spectrum Scale on the gateway
node.
Tuning for 1GigE networks is different from tuning 10GigE networks. For 10GigE, all settings need to be
scaled up, but not necessarily by a factor of 10. Many of these settings are affected after a server reboot.
Therefore, each time a server restarts, the settings must be reset. The TCP buffer tuning is required for all
10GigE links and for 1GigE links where the value of RTT is greater than 0.
For 1 GigE, if there is no round-trip time, leave the default value. You can increase the value to a number
greater than 16 if the round-trip time is large. For 10 GigE, ensure that this value is 48 or a number
greater than 48 depending on the round-trip time.
When you set the seqDiscardThreshold parameter, it affects AFM or AFM DR as follows:
v If I/O requests are from a node that is not the gateway node, there is no effect.
v If the read request is on the gateway node for an uncached file, a higher seqDiscardThreshold value
results in better performance as it allows the gateway to cache more data. When the data is returned to
the application, there is a greater chance that it comes out of the cache/primary cluster.
Tuning on both the NFS client (gateway) and the NFS server (the
home/secondary cluster)
This topic describes the tuning on both the NFS client (gateway) and the NFS server (the
home/secondary cluster).
You must set TCP values that are appropriate for the delay (buffer size = bandwidth * RTT).
For example, if your ping time is 50 ms, and the end-to-end network consists of all 100BT Ethernet and
OC3 (155 Mbps), the TCP buffers must be the following: 0.05 sec * 10 MB/sec = 500 KB
If you are connected using a T1 line (1 Mbps) or less, do not change the default buffers. Faster networks
usually benefit from buffer tuning.
The following parameters can also be used for tuning. A 12194304 buffer size is provided here as an
example value for a 1 GigE link with a delay of 120ms. To set these values, set the following
configurations in a file and load it with sysctl -p filename.
The following are example values. Initial testing is required to determine the best value for a particular
system:
Note: For TCP tuning, the sysctl value changes do not take effect until a new TCP connection is created,
which occurs at NFS mount time. Therefore, for TCP changes, it is critical that the AFM fileset and the
NFS client are unmounted and GPFS is shut down.
With Red Hat Enterprise Linux 6.1 and later, both the NFS client and the server perform TCP
auto-tuning. It automatically increases the size of the TCP buffer within the specified limits through
sysctl. If the client or the server TCP limits are too low, the TCP buffer grows for various round-trip
time between the GPFS clusters. With Red Hat Enterprise Linux 6.1 and earlier, NFS is limited in its
ability to tune the TCP connection. Therefore, do not use a version earlier than Red Hat Enterprise Linux
6.1 in the cache/primary cluster.
As a GPFS cluster might be handling local and remote NFS clients, you can set the GPFS server values
for the largest expected round-trip time of any NFS client. This ensures that the GPFS server can handle
clients at various locations. Then on the NFS clients, set the TCP buffer values that are appropriate for the
SONAS cluster that they are accessing.
The gateway node is both an NFS server for standard NFS clients if they exist and an NFS client for
communication with the home/secondary cluster. Ensure that the TCP values are set appropriately,
because values that are either too high or too low can negatively impact performance.
If performance continues to be an issue, increase the buffer value by up to 50%. If you increase the buffer
value by more than 50%, it might have a negative effect.
After the NFS values are set, you can mount and access
the AFM filesets. The first time the fileset is accessed the
AFM NFS client mounts the home/secondary server or
servers. To see these mounts on a gateway node, enter
the following command: cat /proc/mounts.
Chapter 10. Tuning for Kernel NFS backend on AFM and AFM DR 105
106 IBM Spectrum Scale 5.0.2: Administration Guide
Chapter 11. Performing GPFS administration tasks
Before you perform GPFS administration tasks, review topics such as getting started with GPFS,
requirements for administering a GPFS file system, and common command principles.
For information on getting started with GPFS, see the IBM Spectrum Scale: Concepts, Planning, and
Installation Guide. This includes:
1. Installing GPFS
2. GPFS cluster creation considerations
3. Configuring and tuning your system for GPFS
4. Starting GPFS
5. Network Shared Disk creation considerations
6. File system creation considerations
The information for administration and maintenance of GPFS and your file systems is covered in topics
including:
1. “Requirements for administering a GPFS file system” and “Common GPFS command principles” on
page 109
2. Chapter 1, “Configuring the GPFS cluster,” on page 1
3. Chapter 13, “Managing file systems,” on page 115
4. Chapter 15, “Managing disks,” on page 165
5. Chapter 20, “Managing GPFS quotas,” on page 293
6. Chapter 22, “Managing GPFS access control lists,” on page 311
7. Command reference in IBM Spectrum Scale: Command and Programming Reference
8. GPFS programming interfaces in IBM Spectrum Scale: Command and Programming Reference
9. GPFS user exits in IBM Spectrum Scale: Command and Programming Reference
10. Chapter 24, “Considerations for GPFS applications,” on page 343
11. Chapter 14, “File system format changes between versions of IBM Spectrum Scale,” on page 161
On Windows, root authority normally means users in the Administrators group. However, for clusters
with both Windows and UNIX nodes, only the special Active Directory domain user root qualifies as
having root authority for the purposes of administering GPFS. For more information on GPFS
prerequisites, see the topic Installing GPFS prerequisites in the IBM Spectrum Scale: Concepts, Planning, and
Installation Guide.
The GPFS commands are designed to maintain the appropriate environment across all nodes in the
cluster. To achieve this goal, the GPFS commands use the remote shell and remote file copy commands
that you specify on either the mmcrcluster or the mmchcluster command.
The default remote commands are ssh and scp, but you can designate any other remote commands
provided they have compatible syntax.
The way the passwordless access is achieved depends on the particular remote execution program and
authentication mechanism that is used. For example, for rsh and rcp, you might need a properly
configured .rhosts file in the root user's home directory on each node in the GPFS cluster. If the remote
program is ssh, you can use private identity files that do not have a password. Or, if the identity file is
password-protected, you can use the ssh-agent utility to establish an authorized session before you issue
mm commands.
You can avoid configuring your GPFS nodes to allow remote access to the root user ID, by using sudo
wrapper scripts to run GPFS administrative commands. See “Running IBM Spectrum Scale commands
without remote root login” on page 17.
GPFS does not need to know which nodes are being used for administration purposes. It is the
administrator's responsibility to issue mm commands only from nodes that are properly configured and
can access the rest of the nodes in the cluster.
Note: If your cluster includes Windows nodes, you must designate ssh and scp as the remote
communication program.
The adminMode attribute is set with the mmchconfig command and can have one of two values:
allToAll
Indicates that all nodes in the cluster can be used for running GPFS administration commands and
that all nodes are able to execute remote commands on any other node in the cluster without the
need of a password.
The major advantage of this mode of operation is that GPFS can automatically recover missing or
corrupted configuration files in almost all circumstances. The major disadvantage is that all nodes in
the cluster must have root level access to all other nodes.
central
Indicates that only a subset of the nodes will be used for running GPFS commands and that only
those nodes will be able to execute remote commands on the rest of the nodes in the cluster without
the need of a password.
The major advantage of this mode of administration is that the number of nodes that must have root
level access to the rest of the nodes is limited and can be as low as one. The disadvantage is that
GPFS may not be able to automatically recover from loss of certain configuration files. For example, if
the SSL key files are not present on some of the nodes, the operator may have to intervene to recover
the missing data. Similarly, it may be necessary to shut down GPFS when adding new quorum
nodes. If an operator intervention is needed, you will see appropriate messages in the GPFS log or on
the screen.
Note List:
1. Any node used for the IBM Spectrum Scale GUI is considered as an administrative node and
must have the ability to execute remote commands on all other nodes in the cluster without the
need of a password as the root user or as the configured gpfs admin user.
Clusters created with the GPFS 3.3 or later level of the code have adminMode set to central by default.
Clusters migrated from GPFS 3.2 or earlier versions will continue to operate as before and will have
adminMode set to allToAll.
You can change the mode of operations at any time with the help of the mmchconfig command. For
example, to switch the mode of administration from allToAll to central, issue:
mmchconfig adminMode=central
Use the mmlsconfig adminMode command to display the mode of administration currently in effect for
the cluster.
User-defined node classes are created with the mmcrnodeclass command. After a node class is
created, it can be specified as an argument on commands that accept the -N NodeClass option.
User-defined node classes are managed with the mmchnodeclass, mmdelnodeclass, and
mmlsnodeclass commands.
NodeFile
A file that contains a list of nodes. A node file can contain individual nodes or node ranges.
For commands operating on a file system, the stripe group manager node is always implicitly included in
the node list. Not every GPFS command supports all of the node specification options described in this
topic. To learn what kinds of node specifications are supported by a particular GPFS command, see the
relevant command description in Command reference in IBM Spectrum Scale: Command and Programming
Reference.
Stanza files
The input to a number of GPFS commands can be provided in a file organized in a stanza format.
A stanza is a series of whitespace-separated tokens that can span multiple lines. The beginning of a
stanza is indicated by the presence of a stanza identifier as the first token on a line. Stanza identifiers
A stanza identifier is followed by one or more stanza clauses describing different properties of the object.
A stanza clause is defined as an Attribute=value pair.
Lines that start with the # (pound sign) character are considered comment lines and are ignored.
Similarly, you can imbed inline comments following a stanza clause; all text after the # character is
considered a comment.
For more information about the IBM Spectrum Scale RAID commands that use stanzas, see IBM Spectrum
Scale RAID: Administration in Elastic Storage Server (ESS) documentation on IBM Knowledge Center.
A stanza file can contain multiple types of stanzas. Commands that accept input in the form of stanza
files expect the stanzas to be syntactically correct but will ignore stanzas that are not applicable to the
particular command. Similarly, if a particular stanza clause has no meaning for a given command, it is
ignored.
For backward compatibility, a stanza file may also contain traditional NSD descriptors, although their use
is discouraged.
# Example for a directly attached disk; most values are allowed to default
%nsd: nsd=DATA6 device=/dev/hdisk6 failureGroup=3
Most IBM Spectrum Scale commands run within the GPFS daemon on the file system manager node.
Even if you start a command on another node of the cluster, the node typically sends the command to
the file system manager node to be executed. (Two exceptions are the mmdiag command and the
mmfsadm dump command, which run on the node where they were started.)
To list the active commands on the file system manager node, follow these steps:
1. Enter the mmlsmgr command with no parameters to discover which node is the file system manager
node. For more information on other options available for the mmlsmgr command, see mmlsmgr
command in IBM Spectrum Scale: Command and Programming Reference guide. In the following example,
the mmlsmgr command reports that node05 is the file system manager node:
# mmlsmgr
file system manager node
---------------- ------------------
gpfs1 192.168.145.14 (node05)
The output indicates that two commands are running: the mmdiag --commands command that you
just entered and the mmrestripefs command, which was started from another node.
Note: The output contains two lines about active commands. Each line begins with the term cmd and
wraps to the next line. You might be interested in the following fields:
start The system time at which the command was received.
SG The name of the file system, or None.
line The command as received by the GPFS daemon.
The remaining input is detailed debugging data that is used for product support. For more
information on mmdiag command output, see the topic mmdiag command in the IBM Spectrum Scale:
Command and Programming Reference guide.
Important: Proper operation of IBM Spectrum Scale depends on reliable TCP/IP communication among
the nodes of a cluster. Before you create or reconfigure an IBM Spectrum Scale cluster, ensure that proper
host name resolution and ICMP echo (network ping) are enabled among the nodes.
With the mmnetverify command, you can do many types of network checks either before or after you
create or reconfigure a cluster. Run the command beforehand to verify that the nodes can communicate
properly. Run the command afterward at any time to verify communication or to analyze a network
problem. For more information, see the topic mmnetverify command in the IBM Spectrum Scale: Command
and Programming Reference.
The mmnetverify command uses the concepts of local nodes and target nodes. A local node is a node from
which a network test is run. You can enter the command on one node and have it run on multiple
separate local nodes. A target node is a node against which a test is run.
You can run tests on one node against multiple nodes. The following command runs tests on node1
against node2 and then on node1 against node3:
mmnetverify connectivity --N node1 --target-nodes node2,node3
You can also run tests on multiple nodes against multiple nodes. The following command runs tests on
node1 against node1 and node2 and then on node2 against node1 and node2:
mmnetverify connectivity --N node1,node2 --target-nodes node1,node2
It is not necessary to enter the command from a node that is involved in testing. For example, you can
run the following command from node1, node2, node3, or any other node in the cluster:
mmnetverify data --N node1 --target-nodes node2,node3
To run tests against all the nodes in the cluster, omit the --target-nodes parameter (example 1). Similarly,
to run the test on all the nodes in the cluster, omit the --N parameter (example 2):
(1) mmnetverify data-medium --N node1
(2) mmnetverify data-medium --target-nodes node2,node3,node4
The groups of tests include connectivity, port, data, bandwidth, and flood tests. You can run tests
individually or as a group. For example, you can run resolution, ping, shell, and copy tests individually,
or you can run all of them by specifying the keyword connectivity.
The command writes the results of tests to the console by default, or to a log file as in the following
example:
mmnetverify port --N node1 --target-nodes all --log-file results.log
If you are running tests against nodes that are not organized into a cluster, you must specify the nodes in
a configuration file. The file must at minimum contain a list of the nodes in the test. You must also
include the node from which you are starting the command:
Run the command in the usual way and include the configuration file:
mmnetverify ping --N node1,node2,node3,node4 --target-nodes
node1,node2,node3,node4 --configuration-file config.txt
You can also use the configuration file for other purposes, such as specifying a nondefault shell command
or file copy command.
Related information:
See the topic mmnetverify command in the IBM Spectrum Scale: Command and Programming Reference.
For information on how to create GPFS file systems, see A sample file system creation in IBM Spectrum Scale:
Concepts, Planning, and Installation Guide and the mmcrfs command.
Managing filesets, storage pools and policies is also a file system management task. For more information
on managing storage pools, filesets and policies, see Chapter 26, “Information lifecycle management for
IBM Spectrum Scale,” on page 363. Use the following information to manage file systems in IBM
Spectrum Scale.
To work with this function in the GUI, log on to the IBM Spectrum Scale GUI and select Files > File
Systems. For more information on managing file systems through GUI, see “Creating and managing file
systems using GUI” on page 155.
If you allowed the default value for the automatic mount option (-A yes) when you created the file
system, then you do not need to use this procedure after restarting GPFS on the nodes.
where device is the name of the file system. For example, to mount the file system fs1, enter:
mmmount fs1
To mount file system fs1 on all nodes in the GPFS cluster, issue this command:
mmmount fs1 -a
To mount a file system only on a specific set of nodes, use the -N flag of the mmmount command.
All of the mount options can be specified using the -o parameter. Multiple options should be separated
only by a comma. If an option is specified multiple times, the last instance is the one that takes effect.
Certain options can also be set with specifically designated command flags. Unless otherwise stated,
mount options can be specified as:
The option={1 | 0 | yes | no} syntax should be used for options that can be intercepted by the mount
command and not passed through to GPFS. An example is the atime option in the Linux environment.
The GUI has the following options related to mounting the file system:
1. Mount local file systems on nodes of the local IBM Spectrum Scale cluster.
2. Mount remote file systems on local nodes.
3. Select individual nodes, protocol nodes, or nodes by node class while selecting nodes on which the
file system needs to be mounted.
4. Prevent or allow file systems from mounting on individual nodes.
Do the following to prevent file systems from mounting on a node:
a. Go to Nodes .
b. Select the node on which you need to prevent or allow file system mounts.
c. Select Prevent Mounts from the Actions menu.
d. Select the required option and click Prevent Mount or Allow Mount based on the selection.
5. Configure automatic mount option. The automatic configure option determines whether to
automatically mount file system on nodes when GPFS daemon starts or when the file system is
accessed for the first time. You can also specify whether to exclude individual nodes while enabling
the automatic mount option. To enable automatic mount, do the following:
a. From the Files > File Systems page, select the file system for which you need to enable automatic
mount.
b. Select Configure Automatic Mount option from the Actions menu.
c. Select the required option from the list of automatic mount modes.
d. Click Configure.
To change a file system mount point on protocol nodes, perform the following steps:
1. Unmount the file system:
mmumount fs0 -a
2. Change the mount point:
mmchfs fs0 -T /ibm/new_fs0
3. Change the path of all NFS and SMB exports.
Note: The mmnfs export change and the mmsmb export change commands do not allow path names to
be edited. Therefore, the export needs to be removed and re-added.
4. Change the object CCR files:
v account-server.conf
v container-server.conf
v object-server-.conf
v object-server-sof.conf
v spectrum-scale-object.comf
v spectrum-scale-objectizer.conf
The parameter that you need to change varies depending on the configuration file.
a. Use the mmobj config change command to list the parameters for the file. For example, to list
the parameters for the object-server.conf file, enter:
mobj config list --ccrfile object-server.conf --section DEFAULT --property devices
b. Use the mmobj config change --ccrfile file name to change the parameter. For example, to
change the object-server.conf file, enter:
mmobj config change --ccrfile object-server.conf --section DEFAULT --property devices
/newFS/name
If the file system does not unmount, see the File system fails to unmount section in the IBM Spectrum Scale:
Problem Determination Guide.
where device is the name of the file system. For example, to unmount the file system fs1, enter:
mmumount fs1
To unmount file system fs1 on all nodes in the GPFS cluster, issue this command:
To unmount a file system only on a specific set of nodes, use the -N flag of the mmumount command.
You can utilize the following unmount features that are supported in the GUI:
1. Unmount local file system from local nodes and remote nodes.
2. Unmount a remote file system from the local nodes. When a local file system is unmounted from the
remote nodes, the remote nodes can no longer be seen in the GUI. The Files > File Systems > View
Details > Remote Nodes page lists the remote nodes that currently mount the selected file system.
The selected file system can be a local or a remote file system but the GUI permits to unmount only
local file systems from the remote nodes.
3. Select individual nodes, protocol nodes, or nodes by node class while selecting nodes from which the
file system needs to be unmounted.
4. Specify whether to force unmount. Selecting the Force unmount option while unmounting the file
system unmounts the file system even if it is still busy in performing the I/O operations. Forcing the
unmount operation affects the outstanding operations and causes data integrity issues. The IBM
Spectrum Scale system relies on the native unmount command to carry out the unmount operation.
The semantics of forced unmount are platform-specific. On certain platforms such as Linux, even
when forced unmount is requested, file system cannot be unmounted if it is still referenced by system
kernel. To unmount a file system in such cases, identify and stop the processes that are referencing the
file system. You can use system utilities like lsof and fuser for this.
Specify the file system to be deleted on the mmdelfs command. For example, to delete the file system
fs1, enter:
mmdelfs fs1
gpfs9nsd
gpfs13nsd
gpfs11nsd
gpfs12nsd
Note that the mmlsmount -L command reports file systems that are in use at the time the command is
issued. A file system is considered to be in use if it is explicitly mounted with the mount or mmmount
command or if it is mounted internally for the purposes of running some other GPFS command. For
example, when you run the mmrestripefs command, the file system will be internally mounted for the
duration of the command. If mmlsmount is issued in the interim, the file system will be reported as
being in use by the mmlsmount command but, unless it is explicitly mounted, will not show up in the
output of the mount or df commands. For more details, see For more information, see the topic The
mmlsmount command in the IBM Spectrum Scale: Problem Determination Guide..
This is an example of a mmlsmount -L command for a mounted file system named fs1:
File system fs1 (mnsd.cluster:fs1) is mounted on 5 nodes:
9.114.132.101 c5n101 mnsd.cluster
9.114.132.100 c5n100 mnsd.cluster
9.114.132.106 c5n106 mnsd.cluster
9.114.132.97 c5n97 cluster1.cluster
9.114.132.92 c5n92 cluster1.cluster
The online mode operates on a mounted file system and is chosen by issuing the -o option. Conversely,
the offline mode operates on an unmounted file system. In general, it is unnecessary to run mmfsck in
offline mode unless under the direction of the IBM Support Center.
The online mode checks and recovers only unallocated blocks on a mounted file system. If a GPFS file
operation fails due to an out of space condition, the cause might be disk blocks that are unavailable after
repeated node failures. The corrective action that is taken is to mark the block free in the allocation map.
Any other inconsistencies that are found are only reported, not repaired.
Note:
1. If you are running the online mmfsck command to free allocated blocks that do not belong to any files,
plan to make file system repairs when system demand is low. This is I/O intensive activity and it can
affect system performance.
2. If you are repairing a file system due to node failure and the file system has quotas that are enabled,
it is suggested that you run the mmcheckquota command to make quota accounting consistent.
To repair any other inconsistencies, you must run the offline mode of the mmfsck command on an
unmounted file system. The offline mode checks for these file inconsistencies that might cause problems:
v Blocks marked allocated that do not belong to any file. The corrective action is to mark the block free
in the allocation map.
v Files and directories for which an inode is allocated and no directory entry exists, known as orphaned
files. The corrective action is to create directory entries for these files in a lost+found subdirectory in
the root directory of the fileset to which the file or directory belongs. A fileset is a subtree of a file
system namespace that in many respects behaves like an independent file system. The index number of
the inode is assigned as the name. If you do not allow the mmfsck command to reattach an orphaned
file, it asks for permission to delete the file.
The mmfsck command performs other functions that are not listed here, as deemed necessary by GPFS.
The --patch-file parameter of the mmfsck command can be used to generate a report of file system
inconsistencies. Consider this example of a patch file that is generated by mmfsck for a file system with a
bad directory inode:
gpfs_fsck
<header>
sgid = "C0A87ADC:5555C87F"
disk_data_version = 1
fs_name = "gpfsh0"
#patch_file_version = 1
#start_time = "Fri May 15 16:32:58 2015"
#fs_manager_node = "h0"
#fsck_flags = 150994957
</header>
<patch_inode>
patch_type = "dealloc"
snapshot_id = 0
inode_number = 50432
</patch_inode>
<patch_block>
snapshot_id = 0
inode_number = 3
block_num = 0
indirection_level = 0
generation_number = 1
is_clone = false
is_directory_block = true
rebuild_block = false
#num_patches = 1
<patch_dir>
entry_offset = 48
entry_fold_value = 306661480
delete_entry = true
</patch_dir>
</patch_block>
<patch_block>
snapshot_id = 0
inode_number = 0
block_num = 0
indirection_level = 0
generation_number = 4294967295
is_clone = false
<patch_field>
record_number = 3
field_id = "inode_num_links"
new_value = 2
old_value = 3
</patch_field>
</patch_block>
<patch_inode>
patch_type = "orphan"
snapshot_id = 0
inode_number = 50433
</patch_inode>
<footer>
#stop_time = "Fri May 15 16:33:06 2015"
#num_sections = 203
#fsck_exit_status = 8
need_full_fsck_scan = false
</footer>
The mmfsck command can be run with both the --patch-file and --patch parameters to repair a file
system with the information that is stored in the patch file. Using a patch file prevents a subsequent scan
of the file system before the repair actions begin.
You cannot run the mmfsck command on a file system that has disks in a down state. You must first run
the mmchdisk command to change the state of the disks to unrecovered or up. To display the status of
the disks in the file system, issue the mmlsdisk command.
To check the file system fs1 without making any changes to the file system, issue the following
command:
mmfsck fs1
For complete usage information, see mmchdisk command, mmcheckquota command, mmfsck command, and
mmlsdisk command in IBM Spectrum Scale: Command and Programming Reference
The first time a file system gets mounted, a periodic validation of the nsd, disk, and stripe group
descriptors gets started. This validation occurs, by default, every five seconds. The nsd, disk, and stripe
group descriptors are read and compared with the corresponding descriptors in memory or cache. If
there is a mismatch, that information is logged and, if appropriate, the corrupted data is fixed using data
from cache.
| Overview
| Use file system maintenance mode whenever you perform maintenance on either NSD disks or NSD
| servers that might result in NSDs becoming unavailable. You cannot change any user files or file system
| metadata while the file system in maintenance mode. This way the system does not mark down NSD
122 IBM Spectrum Scale 5.0.2: Administration Guide
| disks or NSD server nodes when I/O failures occur on those disks because they are not available
| (because of maintenance). Then, administrators can easily complete administrative actions on the NSD
| disks or NSD server nodes.
| IBM Spectrum Scale file system operations that must internally mount the file system cannot be used
| while the file system is in maintenance mode. Other file system administrative operations, such as the
| operations run by the mmlsfs and mmlsdisk commands, can check the file system information.
| You can move the file system into maintenance mode to prevent unexpected or unwanted disk I/O
| operations in the file system when maintenance actions are applied to either the NSD disk systems or file
| system server nodes. I/O failures from any NSD disks or server nodes that are not available might result
| in disks that are marked as down if you do not move the file system into maintenance mode. Any disks
| that are marked as down must be manually started by using the mmchdisk command, which might take
| significant time for a large file system.
| Additionally, no ordering assurance exists for the IBM Spectrum Scale nodes when you start or shut
| down nodes across the cluster. So, if the NSD servers are being shut down earlier than client nodes or
| started up later than client nodes, some NSD disks might also be marked down if I/O operations are run
| on those NSD server nodes. Unless the file system is in maintenance mode, you must manually control
| the shutdown or startup sequence for cluster nodes to avoid disk down events.
| You can move the file system into maintenance mode before you shut down or mount the file system
| during the start process. Do this to release the control on the orders of nodes shutdown or startup
| sequence. When you remotely mount and access a file system, you should move the file system into
| maintenance mode before you shut down the NSD servers in the home cluster. Do this because users of
| remote file system might be not aware of the home cluster status. Then initiating I/O operations from
| remote cluster might cause file system disks to be marked down as well.
| You can enable, disable, or check the status of file system maintenance mode:
| v To enable or disable file system maintenance mode, enter the following command:
| mmchfs <fsName> —maintenance-mode yes [—wait] | no
| v To check the status of file system maintenance mode, enter the following command:
| mmlsfs <fsName> --maintenance-mode
| Before you enter the mmchfs command to enable file system maintenance mode, make sure that you
| unmount the file system on the local and remote clusters. Additionally, long running commands such as
| mmrestripefs must complete because they internally mount the file system. If you cannot wait for long
| running commands, you must specify the --wait parameter. The --wait parameter waits on existing
| mounts and long running commands, and moves the file system into maintenance mode after all existing
| mounts and long running commands complete.
| You can apply maintenance on network shared disk (NSD) disks or server nodes:
| 1. Unmount the file system from all nodes, including remote cluster nodes. Enter the following
| command:
| mmumount <fsName> -a
| 2. Check whether any pending internal mounts exist. Enter the following command:
| mmlsmount <fsName> -L
| 3. Enter the following command to enable maintenance mode:
| mmchfs <fsName> --maintenance-mode yes
| Note: File system mount and other management operations that internally mount file system cannot
| run in this state, such as mmmount and mmrestripefs:
| mmmount <fsName>
| Mon Jul 23 06:02:49 EDT 2018: 6027-1623
| mmmount: Mounting file systems ...
| mount: permission denied
| mmmount: 6027-1639 Command failed. Examine previous error messages to determine cause.
| mmrestripefs <fsName> -b
| This file system is undergoing maintenance and cannot be either mounted or changed.
| mmrestripefs: 6027-1639 Command failed. Examine previous error messages to determine cause.
| 5. Resume the normal file system operations such as mmmount after maintenance is complete. End the
| maintenance mode only after the NSD disks and NSD servers are operational:
| mmchfs <fsName> --maintenance no
| You can run offline fsck to check file system consistency before you resume file system maintenance
| mode.
| CAUTION:
| v If you shut down either the NSD servers or the whole cluster, it is considered maintenance on NSD
| disks or servers and must be done under maintenance mode.
| v If no NSD disks or NSD server nodes are available for a specified file system, the file system
| maintenance mode state cannot be retrieved because it is stored with the stripe group descriptor.
| Additionally, you cannot resume the file system maintenance mode in this scenario.
| Running the fsck service action while the file system is in maintenance mode
| The offline fsck service action can be run while the file system is in maintenance mode. Maintenance
| mode is used to provide a dedicated timing window to check file system consistency when:
| v The offline fsck service action cannot be started while the file system is being used.
| v The offline fsck service action cannot be started due to some unexpected interfering file system mount
| or other management operations.
| Note: These commands fail when you specify them while your file system is in maintenance mode.
| See also
| v mmchfs
| v mmlsfs
|
Listing file system attributes
Use the mmlsfs command to display the current file system attributes. Depending on your configuration,
additional information that is set by GPFS can be displayed to help in problem determination when you
contact the IBM Support Center.
If you specify no options with the mmlsfs command, all file system attributes are listed.
For example, to list all of the attributes for the file system gpfs1, enter:
mmlsfs gpfs1
Some of the attributes that are displayed by the mmlsfs command represent default mount options.
Because the scope of mount options is an individual node, it is possible to have different values on
different nodes. For exact mtime (-E option) and suppressed atime (-S option), the information that is
displayed by the mmlsfs command represents the current setting on the file system manager node. If
these options are changed with the mmchfs command, the change might not be reflected until the file
system is remounted.
For complete usage information, see mmlsfs command in IBM Spectrum Scale: Command and Programming
Reference. For a detailed discussion of file system attributes, see GPFS architecture and File system creation
considerations in IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
Note: All files created after issuing the mmchfs command take on the new attributes. Existing files are
not affected. Use the mmchattr or mmrestripefs -R command to change the replication factor of existing
files. See “Querying and changing file replication attributes.”
For example, to change the default data replication factor to 2 for the file system fs1, enter:
mmchfs fs1 -r 2
For complete usage information, see mmchfs command and mmlsfs command in IBM Spectrum Scale: Command
and Programming Reference. For a detailed discussion of file system attributes, see GPFS architecture and
File system creation considerations in IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
For example, to display the replication factors for two files named project4.sched and project4.resource
in the file system fs1, enter:
mmlsattr /fs1/project4.sched /fs1/project4.resource
See the mmlsattr command in IBM Spectrum Scale: Command and Programming Reference for complete usage
information. For a detailed discussion of file system attributes, see GPFS architecture and File system
creation considerations in IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
You can only increase data and metadata replication as high as the maximum data and maximum
metadata replication factors for that file system. You cannot change the maximum data and maximum
metadata replication factors once the file system has been created.
Specify the file name, attribute, and new value with the mmchattr command. For example, to change the
metadata replication factor to 2 and the data replication factor to 2 for the file named project7.resource in
the file system fs1, enter:
mmchattr -m 2 -r 2 /fs1/project7.resource
See the mmchattr command and the mmlsattr command in IBM Spectrum Scale: Command and Programming
Reference for complete usage information. For a detailed discussion of file system attributes, see GPFS
architecture and File system creation considerations in IBM Spectrum Scale: Concepts, Planning, and Installation
Guide.
This caching policy bypasses file cache and transfers data directly from disk into the user space buffer, as
opposed to using the normal cache policy of placing pages in kernel memory. Applications with poor
cache hit rates or very large I/Os may benefit from the use of Direct I/O.
File compression
You can compress or decompress files either with the mmchattr command or with the mmapplypolicy
command with a MIGRATE rule. You can do the compression or decompression synchronously or defer it
until a later call to mmrestripefile or mmrestripefs.
Beginning with IBM Spectrum Scale version 5.0.0, file compression supports two compression libraries,
zlib and lz4. Zlib is intended primarily for cold data and favors saving space over read-access speed. Lz4
is intended primarily for active data and favors read-access speed over maximized space saving.
Administrators can create policies that select a compression library based on the access characteristics of
the file to be compressed, with file-level granularity.
Note:
v The lz4 compression library requires file system format version 5.0.0 or later (file system format
number 18.00 or later).
v The zlib compression library requires file system format version 4.2.0 or later (file system format
number 15.01 or later).
For more information about file compression, see the following subtopics:
v “Comparison with object compression”
v “When to use file compression”
v “Setting up file compression and decompression” on page 129
v “Warning” on page 130
v “Reported size of compressed files” on page 130
v “Deferred file compression” on page 130
v “Indicators of file compression or decompression” on page 130
v “Partially compressed files” on page 131
v “Updates to compressed files” on page 132
v “File compression and memory mapping” on page 132
v “File compression and direct I/O” on page 132
v “Backing up and restoring compressed files” on page 132
v “FPO environment” on page 133
v “AFM environment” on page 133
v “Limitations” on page 133
File compression and object compression use the same compression technology but are available in
different environments and are configured in different ways. Object compression is available in the
Cluster Export Systems (CES) environment and is configured with the mmobj policy command. With
object compression, you can create an object storage policy that periodically compresses new objects and
files in a GPFS fileset.
File compression is available in non-CES environments and is configured with the mmapplypolicy
command or directly with the mmchattr command.
You can do file compression or decompression with either the mmchattr command or the mmapplypolicy
command.
Note: File compression and decompression with the mmapplypolicy command is not supported on
Windows.
With the mmchattr command, you specify the --compression option and the names of the files or filesets
that you want to compress or decompress. For example, the following command compresses a file with
the lz4 compression library:
mmchattr --compression lz4 trcrpt.150913.13.30.13.3518.txt
For more information, see the topic mmchattr command in the IBM Spectrum Scale: Command and
Programming Reference.
With the mmapplypolicy command, you create a MIGRATE rule that specifies the COMPRESS option and run
mmapplypolicy to apply the rule. For example, the following rule selects files with names that contain the
string green from the datapool storage pool and compresses them with the zlib library:
RULE ’COMPR1’ MIGRATE FROM POOL ’datapool’ COMPRESS(’z’) WHERE NAME LIKE ’green%’
When you do file compression, you can defer the compression operation to a later time. For more
information, see the subtopic “Deferred file compression” on page 130.
Doing any of the following operations while the mmrestorefs command is running can corrupt file data:
v Doing file compression or decompression. This includes compression or decompression with the
mmchattr command or with a policy and the mmapplypolicy command.
v Running the mmrestripefile command or the mmrestripefs command. Do not run either of these
commands for any reason. Do not run these commands to complete a deferred file compression or
decompression.
After a file is compressed, operating system commands, such as ls -l, display the uncompressed size.
Use du or the GPFS command mmdf to display the actual, compressed size. You can also make the stat()
system call to find how many blocks the file occupies.
To defer the compression, with either command, specify the -I defer option. For example, the following
command marks the specified file as needing compression but defers the compression operation:
mmchattr -I defer --compression yes trcrpt.150913.13.30.13.3518.txt
With the mmapplypolicy command, the -I defer option defers compression or decompression and data
movement or deletion. For example, the following command applies the rules in the file policyfile but
defers the file operations that are specified in the rules, including compression or decompression:
mmapplypolicy fs1 -P policyfile -I defer
The mmlsattr command displays two indicators that together describe the state of compression or
decompression of the specified file:
COMPRESSION
The mmlsattr command displays the COMPRESSION flag on the Misc attributes line of its output.
The flag is followed in parentheses by the name of the compression library that was used to
compress the file. See the example of mmlsattr output in Figure 3 on page 131. If present, the
COMPRESSION flag indicates that the file is compressed or is marked for deferred compression. If
absent, the absence indicates that the file is uncompressed or is marked for deferred
decompression.
Note: This flag reflects the state of the GPFS_IWINFLAG_COMPRESSED flag in the
gpfs_iattr64_t structure of the inode of the file. For more information about this structure, see
the topic gpfs_iattr64_t_structure in the IBM Spectrum Scale: Command and Programming Reference.
Note:
v This flag reflects the state of the GPFS_IAFLAG_ILLCOMPRESSED flag in the gpfs_iattr64_t
structure of the inode of the file. For more information about this structure, see the topic
gpfs_iattr64_t_structure in the IBM Spectrum Scale: Command and Programming Reference.
v Some file system events can cause the illCompressed flag to be set. Consider the following
examples:
– When data is written into an already compressed file, the existing data remains compressed
but the new data is uncompressed. The illCompressed flag is set for this file.
– When a compressed file is memory-mapped, the memory-mapped area of the file is
decompressed before it is read into memory. The illCompressed flag is set for this file.
For more information, see the subtopic “Updates to compressed files” on page 132.
In the following example, the output from the mmlsattr command includes both the COMPRESSION flag and
the illCompressed flag. This combination indicates that the file is marked for compression but that
compression is not completed:
mmlsattr -L green02.51422500687
file name: green02.51422500687
metadata replication: 1 max 2
data replication: 2 max 2
immutable: no
appendOnly: no
flags: illCompressed
storage pool name: datapool
fileset name: root
snapshot name:
creation time: Wed Jan 28 19:05:45 2015
Misc attributes: ARCHIVE COMPRESSION (library lz4)
Encrypted: no
Together the COMPRESSION and illCompressed flags indicate the compressed or uncompressed state of the
file. See the following table:
Table 16. COMPRESSION and illCompressed flags
State of the file COMPRESSION is displayed? illCompressed is displayed?
Uncompressed. No No
Decompression is not complete. No Yes
Compressed. Yes No
Compression is not complete. Yes Yes
The COMPRESSION flag of a file is set when the user selects the file to be compressed by the mmchattr
--compress yes command or by a policy run. The flag indicates that the user wants the file to be
compressed.
The compressibility of a file can change over time if its contents are changed. Different parts of a file may
have different compressibility. Based on the 10% space-saving criterion (see the subtopic “Limitations” on
page 133), some compression groups (in granularity of 10 data blocks) of a file might be compressed
while others are not.
In sum, the state of the COMPRESSION flag, on or off, indicates the intention of the user to compress the file
or not. The illCompressed flag indicates the compression execution status. The actual compression status
of the data blocks depends on the illCompressed and COMPRESSION flags and the compressibility of the
current data.
The mmrestorefs command can cause a compressed file in the active file system to become decompressed
if it is overwritten by the restore process. To recompress the file, run the mmrestripefile command with
the -z option.
For more information, see the preceding subtopic “Deferred file compression” on page 130.
As a convenience, the file system does not compress an uncompressed file or partially decompressed file
if the file is memory-mapped. Compressing the file would not be not effective because memory mapping
decompresses any compressed data in the regions that are paged in.
You can open a compressed file for Direct I/O, but internally the direct I/O reads and writes are replaced
by buffered decompressed I/O reads and writes.
As a convenience, the file system does not compress a file that is opened for Direct I/O. Compressing the
file would not be effective because direct I/O would be replaced by buffered decompressed I/O.
Files are decompressed when they are moved out of storage that is directly managed by IBM Spectrum
Scale. This fact affects file backups by products such as IBM Spectrum Protect, IBM Spectrum Protect for
Space Management (HSM), IBM Spectrum Archive™, Transparent cloud tiering (TCT), and others. When
When you restore a file to the IBM Spectrum Scale file system, the file data remains uncompressed but
the illCompressed flag is still set. You can recompress the file by running mmrestripefs or
mmrestripefile with the -z option.
FPO environment
File compression supports a File Placement Optimizer (FPO) environment or horizontal storage pools.
FPO block group factor: Before you compress files in a File Placement Optimizer (FPO) environment,
you must set the block group factor to a multiple of 10. If you do not, then data block locality is not
preserved and performance is slower.
For compatibility reasons, before you do file compression with FPO files, you must upgrade the whole
cluster to version 4.2.1 or later. To verify that the cluster is upgraded, follow these steps:
1. At the command line, enter the mmlsconfig command with no parameters.
2. In the output, verify that minReleaseLevel is 4.2.1 or later.
AFM environment
Files that belong to AFM and AFM DR filesets can also be compressed and decompressed. Compressed
file contents are decompressed before being transferred from home to cache or from primary to
secondary.
Before you do file compression with AFM and AFM DR, you must upgrade the whole cluster to version
5.0.0.
Limitations
With QoS, you can prevent I/O-intensive, long-running GPFS maintenance commands from dominating
file system I/O performance and significantly delaying other tasks. Commands like the examples in
Figure 4 can generate hundreds or thousands of requests for I/O operations per second. The high
demand can greatly slow down normal tasks that are competing for the same I/O resources.
mmrestripefs fsname -N
mmapplypolicy fsname -N all ...
The I/O intensive, potentially long-running GPFS commands are collectively called maintenance commands
and are listed in the help topic for the mmchqos command in the IBM Spectrum Scale: Command and
Programming Reference.
With QoS configured, you can assign an instance of a maintenance command to a QoS class that has a
lower I/O priority. Although the instance now takes longer to run to completion, normal tasks have
greater access to I/O resources and run more quickly.
Note:
v QoS requires the file system to be at V4.2.0.0 or later. To check the file system level, enter the following
command:
mmlsfs fileSystemName -V
v QoS works with asynchronous I/O, memory-mapped I/O, cached I/O, and buffered I/O. However,
with direct I/O, QoS counts the IOPS but does not regulate them.
The following steps provide an overview of how to use QoS. In this overview, assume that the file system
fs0 contains 5 nodes and has two storage pools: the system storage pool (system) and another storage
pool sp1.
1. Monitor your file system with the mmlsqos command to determine its maximum capacity in I/O
operations per second (IOPS). Follow these steps:
a. Enable QoS without placing any limits on I/O consumption. The following command sets the QoS
classes of both storage pools to unlimited:
Note: Make sure that the virtual storage Logical Unit Numbers (LUNs) of different storage pools
do not map to the same physical devices.
By default, QoS divides specific allocations of IOPS evenly among the nodes in the file system. In
this overview there are 5 nodes. So QoS allocates 200 IOPS to the maintenance class of the system
pool and 100 IOPS to the maintenance class of the sp1 storage pool on each node.
Note: You can also divide IOPS among a list of nodes or among the nodes of a node class. For
example, you can use the mmcrnodeclass command to create a class of nodes that do maintenance
commands. You can than divide IOPS among the members of the node class by entering a
command like the following one:
mmchqos fs0 --enable -N nodeClass pool=sp2,maintenance=880IOPS,other=unlimited
If the file system serves remote clusters, you can divide IOPS among the members of a remote
cluster by entering a command like the following one:
mmchqos fs0 --enable -C remoteCluster pool=sp3,maintenance=1000IOPS,other=unlimited
b. Allocate the remaining IOPS to the other classes. It is a good idea to accomplish this task by
setting other to unlimited in each storage class. Then normal tasks can absorb all the IOPS of the
system when no maintenance commands are running. See the third column of the following table:
Table 18. Allocate the available IOPS
Storage pool QoS class: maintenance QoS class: other
system 1000 IOPS (200 IOPS per node) unlimited
sp1 500 IOPS (100 IOPS per node) unlimited
All maintenance command instances that are running at the same time and that access the same
storage pool compete for the IOPS that you allocated to the maintenance class of that storage pool. If
the IOPS limit of the class is exceeded, then QoS queues the extra I/O requests until more IOPS
become available.
When you change allocations, mount the file system, or reenable QoS, a brief delay due to
reconfiguration occurs before QoS starts applying allocations.
6. To monitor the consumption of IOPS while a maintenance command is running, run the mmlsqos
command. The following command displays the statistics for the preceding 60 seconds during which
a maintenance command was running:
mmlsqos fs0 --seconds 60
See also
v mmchqos command in the IBM Spectrum Scale: Command and Programming Reference
v mmlsqos command in the IBM Spectrum Scale: Command and Programming Reference
Restriping offers the opportunity to specify useful options in addition to rebalancing (-b option).
Re-replicating (-r or -R option) provides for proper replication of all data and metadata. If you use
replication, this option is useful to protect against additional failures after losing a disk. For example, if
you use a replication factor of 2 and one of your disks fails, only a single copy of the data would remain.
If another disk then failed before the first failed disk was replaced, some data might be lost. If you expect
delays in replacing the failed disk, you could protect against data loss by suspending the failed disk
using the mmchdisk command and re-replicating. This would assure that all data existed in two copies
on operational disks.
If files are assigned to one storage pool, but with data in a different pool, the placement (-p) option will
migrate their data to the correct pool. Such files are referred to as ill-placed. Utilities, such as the
mmchattr command or policy engine, may change a file's storage pool assignment, but not move the
data. The mmrestripefs command may then be invoked to migrate all of the data at once, rather than
migrating each file individually. Note that the rebalance (-b) option also performs data placement on all
files, whereas the placement (-p) option rebalances only the files that it moves.
If you do not replicate all of your files, the migrate (-m) option is useful to protect against data loss when
you have an advance warning that a disk may be about to fail, for example, when the error logs show an
excessive number of I/O errors on a disk. Suspending the disk and issuing the mmrestripefs command
with the -m option is the quickest way to migrate only the data that would be lost if the disk failed.
If you do not use replication, the -m and -r options are equivalent; their behavior differs only on
replicated files. After a successful re-replicate (-r option) all suspended disks are empty. A migrate
operation, using the -m option, leaves data on a suspended disk as long as at least one other replica of
Use the -z option to perform any deferred or incomplete compression or decompression of files in the file
system.
Consider the necessity of restriping and the current demands on the system. New data which is added to
the file system is correctly striped. Restriping a large file system requires extensive data copying and may
affect system performance. Plan to perform this task when system demand is low.
If you are sure you want to proceed with the restripe operation:
1. Use the mmchdisk command to suspend any disks to which you do not want the file system
restriped. You may want to exclude disks from file system restriping because they are failing. See
“Changing GPFS disk states and parameters” on page 171.
2. Use the mmlsdisk command to assure that all disk devices to which you do want the file system
restriped are in the up/normal state. See “Displaying GPFS disk states” on page 170.
Specify the target file system with the mmrestripefs command. For example, to rebalance (-b option) file
system fs1 after adding an additional RAID device, enter:
mmrestripefs fs1 -b
Note: Rebalancing of files is an I/O-intensive and time-consuming operation, and is important only for
file systems with large files that are mostly invariant. In many cases, normal file update and creation will
rebalance your file system over time, without the cost of the rebalancing.
For complete usage information, see mmrestripefs command in IBM Spectrum Scale: Command and
Programming Reference.
Note: The mmdf command may require considerable metadata I/O, and should be run when the system
load is light.
Specify the file system you want to query with the mmdf command. For example, to query available
space on all disks in the file system fs1, enter:
mmdf fs1
Disks in storage pool: fs1sp1 (Maximum disk size allowed is 122 GB)
hd30n01 8897968 8 no yes 8895488 (100%) 424 ( 0%)
hd31n01 8897968 8 no yes 8895488 (100%) 424 ( 0%)
------------- -------------------- -------------------
(pool total) 17795936 17790976 (100%) 848 ( 0%)
Inode Information
-----------------
Number of used inodes: 9799
Number of free inodes: 4990393
Number of allocated inodes: 5000192
Maximum number of inodes: 5000192
For complete usage information, see mmdf command in IBM Spectrum Scale: Command and Programming
Reference.
In order to write to a file system, free full blocks of disk space are required. Due to fragmentation, it is
entirely possible to have the situation where the file system is not full, but an insufficient number of free
full blocks are available to write to the file system. Replication can also cause the copy of the fragment to
be distributed among disks in different failure groups. The mmdefragfs command can be used to query
the current fragmented state of the file system and reduce the fragmentation of the file system.
In order to reduce the fragmentation of a file system, the mmdefragfs command migrates fragments to
free space in another fragmented disk block of sufficient space, thus creating a free full block. There is no
requirement to have a free full block in order to run the mmdefragfs command. The execution time of
the mmdefragfs command depends on the size and allocation pattern of the file system. For a file system
with a large number of disks, the mmdefragfs command will run through several iterations of its
algorithm, each iteration compressing a different set of disks. Execution time is also dependent on how
fragmented the file system is. The less fragmented a file system, the shorter time for the mmdefragfs
command to execute.
The mmdefragfs command can be run on both a mounted or an unmounted file system, but achieves
best results on an unmounted file system. Running the command on a mounted file system can cause
conflicting allocation information and consequent retries to find a new free subblock of the correct size to
store the fragment in.
For example, to display the current fragmentation information for file system fs0, enter:
mmdefragfs fs0 -i
For complete usage information, see mmdefragfs command in IBM Spectrum Scale: Command and
Programming Reference.
For example, to reduce the amount of fragmentation for file system fs1 with a goal of 100% utilization,
enter:
mmdefragfs fs1 -u 100
See the mmdefragfs command in IBM Spectrum Scale: Command and Programming Reference for complete
usage information.
You can use the mmbackup command to back up the files of a GPFS file system or the files of an
independent fileset to an IBM Spectrum Protect server.
Alternatively, you can utilize the GPFS policy engine (mmapplypolicy command) to generate lists of files
to be backed up and provide them as input to some other external storage manager.
The file system configuration information can be backed up using the mmbackupconfig command.
Note: Windows nodes do not support the mmbackup, mmapplypolicy, and mmbackupconfig
commands.
The mmbackup command utilizes all the scalable, parallel processing capabilities of the mmapplypolicy
command to scan the file system, evaluate the metadata of all the objects in the file system, and
determine which files need to be sent to backup in IBM Spectrum Protect, as well which deleted files
should be expired from IBM Spectrum Protect. Both backup and expiration take place when running
mmbackup in the incremental backup mode.
The mmbackup command can interoperate with regular IBM Spectrum Protect commands for backup
and expire operations. However if after using mmbackup, any IBM Spectrum Protect incremental or
selective backup or expire commands are used, mmbackup needs to be informed of these activities. Use
These databases shadow the inventory of objects in IBM Spectrum Protect so that only new changes will
be backed up in the next incremental mmbackup. Failing to do so will needlessly back up some files
additional times. The shadow database can also become out of date if mmbackup fails due to certain IBM
Spectrum Protect server problems that prevent mmbackup from properly updating its shadow database
after a backup. In these cases it is also required to issue the next mmbackup command with either the -q
option or the --rebuild options.
Note: Avoid unlinking a fileset while running mmbackup. If a fileset is unlinked before mmbackup
starts, it is handled; however, unlinking a fileset during the job could result in a failure to back up
changed files as well as expiration of already backed up files from the unlinked fileset.
The mmbackup command supports backing up GPFS file system data to multiple IBM Spectrum Protect
servers. The ability to partition file backups across multiple IBM Spectrum Protect servers is particularly
useful for installations that have a large number of files. For information on setting up multiple IBM
Spectrum Protect servers, see “IBM Spectrum Protect requirements” on page 142.
Unless otherwise specified, the mmbackup command backs up the current active version of the GPFS file
system. If you want to create a backup of files at a specific point in time, first use the mmcrsnapshot
command to create either a global snapshot or a fileset-level snapshot, and then specify that snapshot
name for the mmbackup -S option. A global snapshot can be specified for either --scope filesystem or
--scope inodespace. A fileset-level snapshot can only be specified with --scope inodespace.
If an unlinked fileset is detected, the mmbackup processing will issue an error message and exit. You can
force the backup operation to proceed by specifying the mmbackup -f option. In this case, files that
belong to unlinked filesets will not be backed up, but will be removed from the expire list.
If you have file systems that were backed up using the GPFS 3.2 or earlier version of the mmbackup
command, you will not be able to take advantage of some of the new mmbackup features until a new
full backup is performed. See “File systems backed up using GPFS 3.2 or earlier versions of mmbackup”
on page 143.
If you want to create a backup of a fileset at a specific point in time, first use the mmcrsnapshot
command to create a fileset-level snapshot. Next, specify that snapshot name for the mmbackup -S
option along with the --scope inodespace option.
For details on the supported versions of IBM Spectrum Protect, client and server installation and setup,
and include and exclude lists, see the IBM Tivoli® Storage Manager V7.1.7 documentation
(www.ibm.com/support/knowledgecenter/SSGSG7_7.1.7/com.ibm.itsm.ic.doc/welcome.html).
1. Ensure that the supported versions of the IBM Spectrum Protect client and server are installed. See
the IBM Spectrum Scale FAQ in IBM Knowledge Center (www.ibm.com/support/knowledgecenter/
STXKQY/gpfsclustersfaq.html).
2. Ensure that the IBM Spectrum Protect server and clients are configured properly for backup
operations.
3. If you are using multiple IBM Spectrum Protect servers to protect data, ensure that the IBM Spectrum
Protect servers are set up properly.
4. Ensure the required dsm.sys and dsm.opt configuration files are present in the IBM Spectrum Protect
configuration directory on each node used to run mmbackup or named in a node specification with
-N.
5. If you want to include or exclude specific files or directories by using include-exclude lists, ensure
that the lists are set up correctly before you invoke the mmbackup command.
The mmbackup command uses a IBM Spectrum Protect include-exclude list for including and
excluding specific files or directories. See the Tivoli documentation for information about defining an
include-exclude list.
Note: IBM Spectrum Protect interprets its include and exclude statements in a unique manner that is
not precisely matched by the GPFS mmapplypolicy file selection language. The essential meaning of
each supported include or exclude statement is followed, but the commonly used IBM Spectrum
Protect idiom of excluding everything as the last statement and including selective directory or file
name patterns in prior statements should not be used with GPFS and mmbackup. The exclusion
pattern of "/*" is interpreted by mmapplypolicy to exclude everything, and no data is backed up.
A very large include-exclude list can decrease backup performance. Use wildcards and eliminate
unnecessary include statements to keep the list as short as possible.
6. If more than one node will be used to perform the backup operation (mmbackup -N option):
v The mmbackup command will verify that the IBM Spectrum Protect Backup-Archive client versions
and configuration are correct before executing the backup. Any nodes that are not configured
correctly will be removed from the backup operation. Ensure that IBM Spectrum Protect clients are
installed and at the same version on all nodes that will invoke the mmbackup command or
participate in parallel backup operations.
v Ensure that IBM Spectrum Protect is aware that the various IBM Spectrum Protect clients are all
working on the same file system, not different file systems having the same name on different client
machines. This is accomplished by using proxy nodes for multiple nodes in the cluster. See the IBM
Spectrum Protect documentation for recommended settings for GPFS cluster nodes setup.
7. Restoration of backed-up data must be done using IBM Spectrum Protect interfaces. This can be done
with the client command-line interface or the IBM Spectrum Protect web client. The IBM Spectrum
Protect web client interface must be made operational if you wish to use this interface for restoring
data to the file system from the IBM Spectrum Protect server.
142 IBM Spectrum Scale 5.0.2: Administration Guide
8. When more than one IBM Spectrum Protect server is referenced in the dsm.sys file, mmbackup uses
all listed IBM Spectrum Protect servers by default. To use only a select IBM Spectrum Protect server
or the servers that are listed in dsm.sys, use the mmbackup --tsm-servers option. When more than
one IBM Spectrum Protect server is used for backup, the list and the order specified should remain
constant. If additional IBM Spectrum Protect servers are added to the backup later, add them to the
end of the list that is specified with the mmbackup --tsm-servers option.
9. IBM Spectrum Protect does not support special characters in the path names and in some cases cannot
back up a path name that has special characters. A limited number of special characters are supported
on IBM Spectrum Protect client 6.4.0.0 and later versions with client options
WILDCARDSARELITERAL and QUOTESARELITERAL. Use these IBM Spectrum Protect options
with the mmbackup --noquote option if you have path names with special characters. The
mmbackup command does not back up path names containing any newline, Ctrl+X, or Ctrl+Y
characters. If the mmbackup command finds unsupported characters in the path name, it writes that
path to a file called mmbackup.unsupported.tsmserver at the root of the mmbackup record directory
(by default it is the root of the file system).
Attention: If you are using the IBM Spectrum Protect Backup-Archive client command line or web
interface to do back up, use caution when you unlink filesets that contain data backed up by IBM
Spectrum Protect. IBM Spectrum Protect tracks files by path name and does not track filesets. As a result,
when you unlink a fileset, it appears to IBM Spectrum Protect that you deleted the contents of the fileset.
Therefore, the IBM Spectrum Protect Backup-Archive client inactivates the data on the IBM Spectrum
Protect server, which may result in the loss of backup data during the expiration process.
The GPFS 3.3 through GPFS 3.5.0.11 versions of the mmbackup command will preserve this type of
processing for incremental backups until a new full backup is performed. Once a full backup is
performed, mmbackup will store the files in IBM Spectrum Protect under their usual GPFS root directory
path name; all files under /Device/.snapshots/.mmbuSnapshot will be marked for expiration. Until the
transition to using the usual GPFS root directory path name in IBM Spectrum Protect is complete, no
backups can be taken from a snapshot, other than the mmbackup temporary snapshot called
.mmbuSnapshot.
Attention: Starting with GPFS 4.1, the mmbackup command will no longer support the
/Device/.snapshots/.mmbuSnapshot path name format for incremental backups. After migrating to
GPFS 4.1, if the older .mmbuSnapshot path name format is still in use, a full backup is required if a full
backup has never been performed with GPFS 3.3 or later. After the full backup is performed, files will
now always be stored in IBM Spectrum Protect under their usual GPFS root directory path name. All files
in IBM Spectrum Protect under /Device/.snapshots/.mmbuSnapshot will be marked for expiration
automatically after a successful backup.
The transition to using the usual GPFS root directory path name format, instead of the
/Device/.snapshots/.mmbuSnapshot path name format permits mmbackup to perform a backup using
any user-specified snapshot, or the live file system interchangeably.
Certain features, such as backing up from an arbitrary snapshot, cannot be used until a full backup is
performed with the GPFS 3.3 or later version of the mmbackup command.
The mmbackup command uses one or more shadow database files to determine changes in the file
system. To convert from the IBM Spectrum Protect interface backup to mmbackup, one must create the
shadow database file or files by using the --rebuild option of mmbackup. The rebuild option queries the
existing IBM Spectrum Protect server or servers and creates a shadow database of the files currently
backed up in IBM Spectrum Protect. After the shadow database file or files are generated, mmbackup can
be used for all future incremental or full backups.
Note: If using multiple IBM Spectrum Protect servers to back up a file system, use the mmbackup
--tsm-servers option to ensure that the proper servers participate in the backup job.
The mmbackup command performs all its work in three major steps, and all of these steps potentially
use multiple nodes and threads:
1. The file system is scanned with mmapplypolicy, and a list is created of every file that qualifies and
should be in backup for each IBM Spectrum Protect server in use. The existing shadow database and
the list generated are then compared and the differences between them yield:
v Objects deleted recently that should be marked inactive (expire)
v Objects modified or newly created to back up (selective)
v Objects modified without data changes; owner, group, mode, and migration state changes to update
(incremental)
2. Using the lists created in step 1, mmapplypolicy is run for files that should be marked inactive
(expire).
3. Using the lists created in step 1, mmapplypolicy is run for selective or incremental backup.
The mmbackup command has several parameters that can be used to tune backup jobs. During the
scanning phase, the resources mmbackup will utilize on each node specified with the -N parameter can
be controlled:
v The -a IscanThreads parameter allows specification of the number of threads and sort pipelines each
node will run during the parallel inode scan and policy evaluation. This parameter affects the
execution of the high-performance protocol that is used when both the -g and -N parameters are
specified. The default value is 2. Using a moderately larger number can significantly improve
performance, but might strain the resources of the node. In some environments a large value for this
parameter can lead to a command failure.
Tip: Set this parameter to the number of CPU cores implemented on a typical node in your GPFS
cluster.
v The -n DirThreadLevel parameter allows specification of the number of threads that will be created and
dispatched within each mmapplypolicy process during the directory scan phase.
During the execution phase for expire, mmbackup processing can be adjusted as follows:
v Automatic computation of the ideal expire bunch count. The number of objects named in each file list
can be determined, separately from the number in a backup list, and automatically computed, if not
specified by the user.
v As an alternative to the automatic computation, the user can control expire processing as follows:
– The --max-expire-count parameter can be used to specify a bunch-count limit for each dsmc expire
command. This parameter cannot be used in conjunction with -B.
– The --expire-threads parameter can be used to control how many threads run on each node running
dsmc expire. This parameter cannot be used in conjunction with -m.
For more information on the mmbackup tuning parameters, see mmbackup command in IBM Spectrum Scale:
Command and Programming Reference.
The $progressCallOut function is executed if the path $progressCallOut names a valid, executable file
and one of the following is true:
v The message class provided with this message is 0.
Or
v At least $progressInterval seconds has elapsed.
Or
v The $progressContent mask has a bit set which matches a bit set in the message class provided with
this message.
The $progressCallOut function is executed during mmbackup with a single argument consisting of the
following colon-separated values:
"$JOB:$FS:$SERVER:$NODENAME:$PHASE:$BCKFILES:$CHGFILES:$EXPFILES:\
$FILESBACKEDUP:$FILESEXPIRED:$ERRORS:$TIME"
Where:
JOB
Specifies the literal backup string to identify this component.
FS Specifies the file system device name.
SERVER
Specifies the IBM Spectrum Protect server currently used for backup.
NODENAME
Specifies the name of the node where mmbackup was started.
PHASE
Specifies either synchronizing, scanning, selecting files, expiring, backing up, analyzing, or
finishing.
BCKFILES
Specifies the total number of files already backed up, or stored, on the IBM Spectrum Protect server.
Starts as the count of all normal mode records in all the current shadow databases in use. If QUERY
is being executed, it will start as the count of files found on the IBM Spectrum Protect server. It will
stay constant until the backup job is complete.
For more information on GPFS policies and rules refer to Chapter 26, “Information lifecycle management
for IBM Spectrum Scale,” on page 363.
Note: The mmbackupconfig command only backs up the file system configuration information. It does
not back up any user data or individual file attributes.
It is recommended that you store the output file generated by mmbackupconfig in a safe location.
IBM has supplied a set of subroutines that are useful to create backups or collect information about all
files in a file system. Each subroutine is described in Programming interfaces in IBM Spectrum Scale:
Command and Programming Reference. These subroutines are more efficient for traversing a file system, and
provide more features than the standard POSIX interfaces. These subroutines operate on a global
snapshot or on the active file system. They have the ability to return all files, or only files that have
changed since some earlier snapshot, which is useful for incremental backup.
The gpfs_ireadx() subroutine is more efficient than read() or gpfs_iread() for sparse files and for
incremental backups. The gpfs_ireaddir() or gpfs_ireaddir64() subroutine is more efficient than
readdir(), because it returns file type information. There are also subroutines for reading symbolic links,
gpfs_ireadlink() or gpfs_ireadlink64() and for accessing file attributes, gpfs_igetattrs().
The SOBAR utilities include the commands mmbackupconfig, mmrestoreconfig, mmimgbackup, and
mmimgrestore. The mmbackupconfig command will record all the configuration information about the
file system to be protected and the mmimgbackup command performs a backup of GPFS file system
metadata. The resulting configuration data file and the metadata image files can then be copied to the
IBM Spectrum Protect server for protection. In the event of a disaster, the file system can be recovered by
recreating the necessary NSD disks, restoring the file system configuration with the mmrestoreconfig
command, and then restoring the image of the file system with the mmimgrestore command. The
mmrestoreconfig command must be run prior to running the mmimgrestore command. SOBAR will
reduce the time needed for a complete restore by utilizing all available bandwidth and all available nodes
in the GPFS cluster to process the image data in a highly parallel fashion. It will also permit users to
For the full details of the SOBAR procedures and requirements, see Scale Out Backup and Restore (SOBAR)
in IBM Spectrum Scale: Command and Programming Reference.
For scheduled events to occur on the client, you must configure the client scheduler to communicate with
the IBM Spectrum Protect server. This is in addition to the following steps. For example, you might need
to start the dsmcad service or add MANAGEDSERVICES schedule to the corresponding IBM Spectrum Protect
stanza in dsm.sys on the client node. For more information, see Configuring the scheduler in the IBM
Spectrum Protect documentation on IBM Knowledge Center.
Note: The following example script must be extended to log the output into files so that verification
or troubleshooting can be done afterwards. Additional options such as --noquote might be needed
depending on the specific needs of the environment.
#!/bin/bash
/usr/lpp/mmfs/bin/mmcrsnapshot gpfs0 BKUPsnap
/usr/lpp/mmfs/bin/mmbackup gpfs0 –t incremental –-tsm-servers tsm1
/usr/lpp/mmfs/bin/mmdelsnapshot gpfs0 BKUPsnap
4. On one of the IBM Spectrum Protect client nodes, verify the schedule using the following command.
dsmc q sched
Note: Refer to the latest IBM Spectrum Protect documentation on IBM Knowledge Center for the latest
information on the mentioned settings.
Important: While the IBM Spectrum Protect client configuration file dsm.sys can contain node specific
information, it cannot simply be copied from node to node without touching or correcting the
corresponding node specific information.
File path name patterns that do not need to be backed up might be excluded by corresponding exclude
statements. For example, temporary files. While IBM Spectrum Protect provides options for excluding
and including, the usage of include options must be avoided when mmbackup is used. The reason is that
mmbackup processing works properly with exclude statements but misinterpretations can arise when both,
include and exclude, options are used together and in worst case have overlapping pattern sequences.
Note: Defining a large number of exclude rules can negatively impact the performance of backup.
Do not add exclude statements for snapshots as snapshots are specially handled automatically by
mmbackup and IBM Spectrum Protect options when needed.
mmbackup excludes the following folders from the scan by default and these need not be explicitly
excluded in the dsm.sys file or on the IBM Spectrum Protect server:
v .mmbackup* - folder in location specified by MMBACKUP_RECORD_ROOT such as /ibm/gpfs0/.mmbackupCfg
v .mmLockDir - folder in the root of the file system
v .SpaceMan - folder anywhere in the file system
v .TsmCacheDir - folder anywhere in the file system
Special consideration is needed when IBM Spectrum Protect server management class definitions are
used. The corresponding include statements must be applied to any dsm.sys and not applied on the IBM
Spectrum Protect server.
IBM Spectrum Protect users might be familiar with dynamic management class assignments available
when using IBM Spectrum Protect dsmc commands to backup files. This is not the case with mmbackup.
Only objects identified by mmbackup as requiring a backup will get the needed management class update
that results when the administrator alters the management class assignment in the dsm.sys file. Therefore,
only by running a complete backup of all affected objects can a management class update be guaranteed.
Despite the recommendation to never utilize the include statements in dsm.sys, when a IBM Spectrum
Protect management class designation is needed, the use of an include statement with the management
class specification is required. In these cases, do the following steps:
1. In the IBM Spectrum Protect client configuration file dsm.sys, arrange the include and exclude
statements as follows:
a. Place all the include statement first in the file along with the management class definitions.
b. Add the exclude statements below the include statements.
c. Ignore the ordering precedence rules defined in the IBM Spectrum Protect documentation
regarding the ordering of these statements. Management class include statements must be listed
above the exclude statements to work properly with mmbackup.
Note: Do not add include statements after exclude statements. Do not add exclude statements before
include statements.
2. Before starting the mmbackup job, set the following environment variable:
export MMBACKUP_IGNORE_INCLUDE=1
In a cluster, an operation that needs to scale is usually executed on more than one node, for example
backup activities. To utilize the services of a IBM Spectrum Protect server from any of the configured
cluster backup nodes, the administrator needs to specify a proxy node. This proxy node needs to be
created on the IBM Spectrum Protect server similar to all other cluster backup nodes that need to be
registered on the IBM Spectrum Protect server before they can be used. On all cluster backup nodes, set
the asnodename option for the desired proxy-client node to be used in the corresponding stanza of the
dsm.sys configuration file.
In such cases, enable the IBM Spectrum Protect client options WILDCARDSARELITERAL and QUOTESARELITERAL
on all nodes that are used in backup activities and make sure that the mmbackup option --noquote is used
when invoking mmbackup.
Note: The characters control-X and control-Y are not supported by IBM Spectrum Protect. Therefore, the
use of these characters in file names in IBM Spectrum Scale file systems results in these files not getting
backed up to IBM Spectrum Protect.
Base IBM Spectrum Protect client configuration files for IBM Spectrum
Scale usage
This topic lists all the Base IBM Spectrum Protect client configuration files and their examples for IBM
Spectrum Scale.
Important: While the IBM Spectrum Protect client configuration file dsm.sys can contain node specific
information, it cannot simply be copied from node to node without touching or correcting the
corresponding node specific information.
Contents of dsm.sys
Note: Substitute the variables starting with '$' with your own required value. See the following example
values of variables.
Servername $servername
COMMMethod TCPip
TCPPort $serverport
TCPServeraddress $serverip
* TCPAdminport $serveradminport
TCPBuffsize 512
PASSWORDACCESS generate
* Place your exclude rules here or configure as cloptset on TSM server
ERRORLOGName $errorlog
ASNODENAME $client-node-proxyname
NODENAME $localnodename
Contents of dsm.opt
* Special character test flags
QUOTESARELITERAL YES
WILDCARDSARELITERAL YES
* to take traces just remove the * from the next two lines:
*TRACEFLAG SERVICE
*TRACEFILE /tmp/tsmtrace.txt
Contents of dsm.opt when IBM Spectrum Protect for Space Management is used
* HSM: Write extObjID to DMAPI attribute ’IBMexID’ for migrated/pre-migrated files
HSMEXTOBJIDATTR yes
* HSM: Deactivate HSM Automigration and Scout search engine as this will be done by GPFS
HSMDISABLEAUTOMIGDAEMONS YES
* HSM file aggregation of small files
HSMGROUPedmigrate yes
* HSM: Determines if files that are less than 2 minutes old can be migrated during selective migration
hsmenableimmediatemigrate yes
For information on how to create and maintain snapshots, see Chapter 27, “Creating and maintaining
snapshots of file systems,” on page 421
152 IBM Spectrum Scale 5.0.2: Administration Guide
Use these steps to restore files or directories from a local file system snapshot.
1. Use the mmlssnapshot device command to list the snapshots in the file system and make a note of the
snapshot that contains the files and directories that you want to restore.
device is the name of the file system.
# mmlssnapshot fs1
For information on how to create and maintain snapshots, see Chapter 27, “Creating and maintaining
snapshots of file systems,” on page 421
Use these steps to restore files or directories from a local fileset snapshot.
1. Use the mmlssnapshot device command to list the snapshots in the file system and make a note of the
snapshot that contains the files and directories that you want to restore.
device is the name of the file system.
# mmlssnapshot fs1
Use the mmcdpsnapqueryrecover sample script to restore files or directories from snapshots into the
user-specified restorePath directory as follows.
1. Use the following command to list all copies of a file or directory in a file system or fileset snapshot.
/usr/lpp/mmfs/samples/ilm/mmcdpsnapqueryrecover.sh Device \
--file-path fsPath --destination-dir restorePath
Where:
v device is the name of the file system.
v file-path is the full file path.
v destination-dir is the full path of the restore directory.
For example, to get all copies of the file /gpfs0/gplssnapshot in the file system gpfs0 and with /opt
as the restore directory, enter the following:
/usr/lpp/mmfs/samples/ilm/mmcdpsnapqueryrecover.sh /dev/gpfs0 \
--file-path /gpfs0/gplssnapshot --destination-dir /opt
Use the Create File System option available in the Files > File Systems page to create file systems on
exisiting NSDs .
Deleting a file system removes all of the data on that file system. Use caution when performing this task.
To delete a file system, select the file system to be deleted and then select Delete from the Actions menu.
The File Systems page provides an easy way to monitor the performance, health status, and configuration
aspects of the all available file systems in the IBM Spectrum Scale cluster.
The following options are available to analyze the file system performance:
1. A quick view that gives the number of NSD servers and NSDs that are part of the available file
systems that are mounted on the GUI server. It also provides overall capacity and total throughput
details of these file systems. You can access this view by selecting the expand button that is placed
next to the title of the page. You can close this view if not required.
The graphs displayed in the quick view are refreshed regularly. The refresh intervals are depended on
the displayed time frame as shown below:
v Every minute for the 5 minutes time frame
v Every 15 minutes for the 1 hour time frame
v Every six hours for the 24 hours time frame
v Every two days for the 7 days time frame
v Every seven days for the 30 days time frame
v Every four months for the 365 days time frame
You can control the access to files and directories in a file system by defining access control lists (ACLs).
ACLs can be inherited within a file system. The link path of the file system does not inherit any ACL
from a parent path. Therefore, you can set the ACL of the file system link path using the Edit Access
Control option.
When creating a file system, a default ACL is set. To modify the access controls defined for a file system,
right-click the file system that is listed in the file system view and select Edit Access Control. The owner,
owning group, and access control list cannot be modified if the directory is not empty. Users with the
role Dataaccess are allowed to modify owner, group, and ACL even when the directory is not empty.
You can use the IBM Spectrum Scale GUI to mount or unmount individual file systems or multiple file
systems on the selected nodes. Use the Files > File Systems, Files > File Systems > View Details >
Nodes, or Nodes > View Details > File Systems page in the GUI to mount or unmount a file system.
The GUI has the following options related to mounting the file system:
1. Mount local file systems on nodes of the local IBM Spectrum Scale cluster.
2. Mount remote file systems on local nodes.
3. Select individual nodes, protocol nodes, or nodes by node class while selecting nodes on which the
file system needs to be mounted.
4. Prevent or allow file systems from mounting on individual nodes.
Do the following to prevent file systems from mounting on a node:
a. Go to Nodes .
b. Select the node on which you need to prevent or allow file system mounts.
c. Select Prevent Mounts from the Actions menu.
d. Select the required option and click Prevent Mount or Allow Mount based on the selection.
5. Configure automatic mount option. The automatic configure option determines whether to
automatically mount file system on nodes when GPFS daemon starts or when the file system is
accessed for the first time. You can also specify whether to exclude individual nodes while enabling
the automatic mount option. To enable automatic mount, do the following:
a. From the Files > File Systems page, select the file system for which you need to enable automatic
mount.
b. Select Configure Automatic Mount option from the Actions menu.
c. Select the required option from the list of automatic mount modes.
d. Click Configure.
Note: You can configure automatic mount option for a file system only if the file system is
unmounted from all nodes. That is, you need to stop I/O on this file system to configure this option.
However, you can include or exclude the individual nodes for automatic mount without unmounting
the file system from all nodes.
You can utilize the following unmount features that are supported in the GUI:
1. Unmount local file system from local nodes and remote nodes.
2. Unmount a remote file system from the local nodes. When a local file system is unmounted from the
remote nodes, the remote nodes can no longer be seen in the GUI. The Files > File Systems > View
Details > Remote Nodes page lists the remote nodes that currently mount the selected file system.
The selected file system can be a local or a remote file system but the GUI permits to unmount only
local file systems from the remote nodes.
Some administrative actions like repairing file system structures by using the mmfsck command, require
that the file system is unmounted on all nodes.
Policies
IBM Spectrum Scale provides a way to automate the management of files by using policies and rules. You
can manage these policies and rules through the Files > Information Lifecycle page of the management
GUI.
A policy rule is an SQL-like statement that tells the file system what to do with the data for a file in a
specific pool if the file meets specific criteria. A rule can apply to any file being created or only to files
being created within a specific fileset or group of filesets.
| File audit logging records file operations on a file system and logs them in a retention-enabled fileset.
| Each file operation is generated as a local event on the node that serves the file operation. These events
| are published to a distributed multinode message queue from which they are consumed to be written
| into the fileset.
| You can enable the file audit logging either while creating a file system by using the Create File System
| option or by using the Enable File Auditing option from the Actions menu for an already created file
| system.
| While enabling the file audit logging, you can specify the following details:
| v The file system in which the file audit log must be stored.
| v Name of the fileset where the audit log must be stored.
| v The period for which the log must be retained.
| Note: The GUI offers to enable file audit logging for a file system if file audit logging and message
| queue are installed and configured. For more information on installing file audit logging and its
| components, see Manually installing file audit logging
| To disable file audit logging, select the file system for which you need to disable the feature and select
| Disable File Auditing from the Actions menu.
| For more information on file auditing, see File Auditing in the IBM Spectrum Scale: Concepts, Planning, and
| Installation GuideFile Auditing in the IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
The file system format number is assigned when the file system is first created and is updated to the
latest supported level after the file system is migrated with the mmchfs -V command.
The format number for a file system can be displayed with the mmlsfs -V command. If a file system was
created with an older GPFS release, new functionality that requires different on-disk data structures is not
enabled until you run the mmchfs -V command. Some new features might require you also to run the
mmmigrate command.
Note: The -V parameter cannot be used to make file systems that were created before GPFS 3.2.1.5
available to Windows nodes. Windows nodes can mount only file systems that were created with GPFS
3.2.1.5 or later.
The mmchfs -V parameter requires the specification of one of two values, full or compat:
v Specifying mmchfs -V full enables all of the new functionality that requires different on-disk data
structures. After this command, nodes in remote clusters that are running an older GPFS version will
no longer be able to mount the file system.
The mmchfs -V full command displays a warning as in the following example:
# mmchfs n03NsdOnFile36 -V full
You have requested that the file system be upgraded to
version 19.01 (5.0.1.0). This will enable new functionality but will
prevent you from using the file system with earlier releases of GPFS.
Do you want to continue?
v Specifying mmchfs -V compat enables only backward-compatible format changes. Nodes in remote
clusters that were able to mount the file system before the format changes can continue to do so
afterward.
| In IBM Spectrum Scale 5.0.2, new file systems are created at file system format number 20.01. To update a
| file system from an earlier format to format number 20.01, issue the following command:
| mmchfs Device -V full
| where Device is the device name of the file system. The following features of IBM Spectrum Scale 5.0.2
| require a file system to be at format number 20.01 or later:
| v The afmGateway attribute of the mmchfileset command specifies a user-defined gateway node for an
| AFM or AFM DR fileset that is given preference over the internal hashing algorithm.
| v The maxActiveIallocSegs performance attribute of the mmchconfig command controls the number of
| active inode allocation segments that are maintained on a node. In IBM Spectrum Scale 5.0.2 and later
| the default number is 8 and the range is 1 - 64. In earlier versions the default value and also the
| maximum value is 1.
| v The watch folder feature provides callback events for monitoring file accesses to folders, filesets, and
| inode spaces. For more information, see the topic Introduction to watch folder in the IBM Spectrum Scale:
| Concepts, Planning, and Installation Guide.
In IBM Spectrum Scale 5.0.1, new file systems are created at format number 19.01. To update the format
of an earlier file system to format number 19.01, issue the following command:
mmchfs Device -V full
In IBM Spectrum Scale 5.0.0, new file systems are created at format number 18.00. To update the format
of an earlier file system to format number 18.00, issue the following command:
mmchfs Device -V full
where Device is the device name of the earlier file system. The following features of IBM Spectrum Scale
5.0.0 require a file system to be at format number 18.00 or later:
v Smaller subblock sizes for file systems that have a large data block size
Note: This feature is supported only for file systems that are created at file system format number
18.00 or later. It is not supported for file systems that are updated to format number 18.00 or later from
an earlier format number. For more information, see the parameter -B BlockSize in the topic mmcrfs in
the IBM Spectrum Scale: Command and Programming Reference.
v File compression with the lz4 compression library
v File audit logging
In IBM Spectrum Scale 4.2.3.0, new file systems are created at format number 17.00. To update the format
of an earlier file system to format number 17.00, issue the following command:
mmchfs Device -V full
where Device is the device name of the earlier file system. The following features of IBM Spectrum Scale
v4.2.3.0 require a file system to be at format number 17.00 or later:
v Quality of Service for I/O (QoS)
v File compression with zlib compression library
v Information lifecycle management (ILM) for snapshots
If your current file system is at format number 14.20 (IBM Spectrum Scale 4.1.1), the set of enabled
features depends on the value specified with the mmchfs -V option:
v After running mmchfs -V full, the file system can support the following:
– Enabling and disabling of quota management without unmounting the file system.
– The use of fileset-level integrated archive manager (IAM) modes.
v There are no new features that can be enabled with mmchfs -V compat.
If your current file system is at format number 14.04 (GPFS 4.1.0.0), the set of enabled features depends
on the value specified with the mmchfs -V option:
v After running mmchfs -V full, the file system can support different block allocation map types on an
individual storage-pool basis.
v There are no new features that can be enabled with mmchfs -V compat.
If your current file system is at format number 13.23 (GPFS 3.5.0.7), the set of enabled features depends
on the value specified with the mmchfs -V option:
v After running mmchfs -V full, the file system can support the following:
– Directory block sizes can be up to 256 KB in size (previous maximum was 32 KB).
– Directories can reduce their size when files are removed.
v There are no new features that can be enabled with mmchfs -V compat.
If your current file system is at format number 13.01 (GPFS 3.5.0.1), the set of enabled features depends
on the value specified with the mmchfs -V option:
v After running mmchfs -V full, the file system can support the following:
– extended storage pool properties
If your current file system is at format number 12.03 (GPFS 3.4), the set of enabled features depends on
the value specified with the mmchfs -V option:
v After running mmchfs -V full, the file system can support the following:
– independent filesets and snapshots of individual independent filesets
– active file management (AFM)
– file clones (writable snapshots of a file)
– policy language support for new attributes, variable names, and functions: OPTS clause for the SET
POOL and RESTORE rules, encoding of path names via an ESCAPE clause for the EXTERNAL LIST
and EXTERNAL POOL rules, GetEnv(), GetMMconfig(), SetXattr(), REGEX().
v There are no new features that can be enabled with mmchfs -V compat.
If your current file system is at format number 11.03 (GPFS 3.3), the set of enabled features depends on
the value specified with the mmchfs -V option:
v After running mmchfs -V full, the file system can support the following:
– more than 2,147,483,648 files
– fast extended attributes (which requires mmmigratefs to be run also)
v There are no new features that can be enabled with mmchfs -V compat.
If your current file system is at format number 10.00 (GPFS 3.2.0.0) or 10.01 (GPFS 3.2.1.5), after running
mmchfs -V, the file system can support all of the features included with earlier levels, plus the following:
v new maximum number of filesets in a file system (10000)
v new maximum for the number of hard links per object (2**32)
v improved quota performance for systems with large number of users
v policy language support for new attributes, variable names, and functions: MODE, INODE, NLINK,
RDEVICE_ID, DEVICE_ID, BLOCKSIZE, GENERATION, XATTR(), ATTR_INTEGER(), and
XATTR_FLOAT()
If your current file system is at format number 9.03 (GPFS 3.1), after running mmchfs -V, the file system
can support all of the features included with earlier levels, plus:
v fine grain directory locking
v LIMIT clause on placement policies
If your current file system is at format number 8.00 (GPFS 2.3), after running mmchfs -V, the file system
can support all of the features included with earlier levels, plus:
v storage pools
v filesets
v fileset quotas
If your current file system is at format number 7.00 (GPFS 2.2), after running mmchfs -V, the file system
can support all of the features included with earlier levels, plus:
v NFS V4 access control lists
v new format for the internal allocation summary files
Chapter 14. File system format changes between versions of IBM Spectrum Scale 163
If your current file system is at format number 6.00 (GPFS 2.1), after running mmchfs -V, the file system
can support all of the features included with earlier levels, plus extended access control list entries (-rwxc
access mode bits).
The functionality described in this topic is only a subset of the functional changes introduced with the
different GPFS releases. Functional changes that do not require changing the on-disk data structures are
not listed here. Such changes are either immediately available when the new level of code is installed, or
require running the mmchconfig release=LATEST command. For a complete list, see the “Summary of
changes” on page xxvii.
Disks can have connectivity to each node in the cluster, be managed by network shared disk servers, or a
combination of the two. For more information, see mmcrnsd command in the IBM Spectrum Scale: Command
and Programming Reference. Also see, Network Shared Disk (NSD) creation considerations in the IBM Spectrum
Scale: Concepts, Planning, and Installation Guide.
Note: A LUN provided by a storage subsystem is a disk for the purposes of this documentation, even if
the LUN is made up of multiple physical disks.
The default is to display information for all disks defined to the cluster (-a). Otherwise, you may choose
to display the information for a particular file system (-f) or for all disks which do not belong to any file
system (-F).
To display the default information for all of the NSDs belonging to the cluster, enter:
mmlsnsd
Storage in a file system is divided in storage pools. The maximum size of any one disk that can be added
to an existing storage pool is set approximately to the sum of the disk sizes when the storage pool is
created. The actual value is shown in the mmdf command output.
Once a storage pool is created, the maximum size cannot be altered. However, you can create a new pool
with larger disks, and then move data from the old pool to the new one.
When establishing a storage pool and when adding disks later to an existing storage pool, you should try
to keep the sizes of the disks fairly uniform. GPFS allocates blocks round robin, and as the utilization
level rises on all disks, the small ones will fill up first and all files created after that will be spread across
fewer disks, which reduces the amount of prefetch that can be done for those files.
The disk may then be added to the file system using the stanza file as input to the mmadddisk command.
Note: Rebalancing of files is an I/O intensive and time consuming operation, and is important only for
file systems with large files that are mostly invariant. In many cases, normal file update and creation will
rebalance your file system over time, without the cost of the rebalancing.
For more information, see the mmadddisk command and the mmcrnsd command in the IBM Spectrum Scale:
Command and Programming Reference.
Consider how fragmentation might increase your storage requirements, especially when the file system
contains a large number of small files. A margin of 150 percent of the size of the disks being deleted
should be sufficient to allow for fragmentation when small files predominate. For example, in order to
delete a 400 GB disk from your file system, which contains user home directories with small files, you
should first determine that the other disks in the file system contain a total of 600 GB of free space.
If you do not replicate your file system data, you should rebalance the file system using the mmrestripefs
-b command. If you replicate your file system data, run the mmrestripefs -r command after the disk has
been deleted. This ensures that all data will still exist with correct replication after the disk is deleted. The
mmdeldisk command only migrates data that would otherwise be lost, not data that will be left in a
single copy.
Note: Rebalancing of files is an I/O intensive and time consuming operation, and is important only for
file systems with large files that are mostly invariant. In many cases, normal file update and creation will
rebalance your file system over time, without the cost of the rebalancing.
Do not delete stopped disks, if at all possible. Start any stopped disk before attempting to delete it from
the file system. If the disk cannot be started you will have to consider it permanently damaged. You will
need to delete the disk using the appropriate mmdeldisk options. If metadata was stored on the disk,
you will need to execute mmfsck in offline mode afterwards.. For more information on handling this
situation, see NSD and underlying disk subsystem failures in the IBM Spectrum Scale: Problem Determination
Guide.
When deleting disks from a file system, the disks might or might not be available. If the disks being
deleted are still available, GPFS moves all of the data from those disks to the disks remaining in the file
system. However, if the disks being deleted are damaged, either partially or permanently, it is not
possible to move all of the data and you will receive I/O errors during the deletion process. For
instructions on how to handle damaged disks, see Disk media failure in the IBM Spectrum Scale: Problem
Determination Guide.
Specify the file system and the names of one or more disks to delete with the mmdeldisk command. For
example, to delete the disk hd2n97 from the file system fs2 enter:
mmdeldisk fs2 hd2n97
For syntax and usage information, refer to mmdeldisk command in the IBM Spectrum Scale: Command and
Programming Reference.
When replacing disks in a GPFS file system, first decide if you will:
1. Create new disks using the mmcrnsd command.
In this case, you must also decide whether to create a new set of NSD and pools stanzas or use the
rewritten NSD and pool stanzas that the mmcrnsd command produces. In a rewritten file, the disk
usage, failure group, and storage pool values are the same as the values that are specified in the
mmcrnsd command.
2. Select NSDs no longer in use by another GPFS file system. Issue the mmlsnsd -F command to display
the available disks.
To replace a disk in the file system, use the mmrpldisk command. For example, to replace the NSD
hd3n97 in file system fs2 with the existing NSD hd2n97, which is no longer in use by another file system,
enter:
mmrpldisk fs2 hd3n97 hd2n97
Note: If you attempt to replace a stopped disk and the file system is not replicated, the attempt will fail.
However, you can replace a stopped disk if the file system is replicated. You can do so in one of the
following ways:
v Deletion, addition, and rebalancing method:
1. Use the mmdeldisk command to delete the stopped disk from the file system.
2. Use the mmadddisk command to add a replacement disk.
3. Use the mmrestripefs -b command to rebalance the file system.
While this method requires rebalancing, it returns the system to a protected state faster (because it can
use all of the remaining disks to create new replicas), thereby reducing the possibility of losing data.
—Or—
v Direct replacement method:
Use the mmrpldisk command to directly replace the stopped disk.
The mmrpldisk command only runs at single disk speed because all data being moved must be
written to the replacement disk. The data is vulnerable while the command is running, and should a
second failure occur before the command completes, it is likely that some data will be lost.
For more information on handling this situation, see Disk media failure in the IBM Spectrum Scale: Problem
Determination Guide.
If you need to delete, replace, or suspend a disk and you need to write new data while the disk is offline,
you can disable strict replication before you perform the disk action. However, data written while
replication is disabled will not be properly replicated. Therefore, after you perform the disk action, you
must re-enable strict replication and run the mmrestripefs -r command. To determine if a file system has
strict replication enforced, issue the mmlsfs -K command.
The information includes parameters that were specified on the mmcrfs command, and the current
availability and status of the disks. For example, to display the current status of the disk hd8vsdn100 in
the file system fs1, enter:
mmlsdisk fs1 -d hd8vsdn100
For syntax and usage information, see the mmlsdisk command in the IBM Spectrum Scale: Command and
Programming Reference.
Disk availability
The following information lists the possible values of disk availability, and what they mean.
A disk's availability determines whether GPFS is able to read and write to the disk. There are four
possible values for availability:
up The disk is available to GPFS for normal read and write operations.
down No read and write operations can be performed on the disk.
recovering
An intermediate state for disks coming up, during which GPFS verifies and corrects data. A write
operation can be performed while a disk is in this state, but a read operations cannot, because
data on the disk being recovered might be stale until the mmchdisk start command completes.
unrecovered
The disk was not successfully brought up.
Disk availability is automatically changed from up to down when GPFS detects repeated I/O errors. You
can also change the availability of a disk by issuing the mmchdisk command.
Disk status
The following information lists the possible values for disk status, and what they mean.
Disk status controls data placement and migration. Status changes as a result of a pending delete
operation, or when the mmchdisk command is issued to allow file rebalancing or re-replicating prior to
disk replacement or deletion.
GPFS migrates data off disks with a status of being emptied, replacing, to be emptied, or suspended
onto disks with a status of ready or replacement. During disk deletion or replacement, data is
automatically migrated as part of the operation. Issue the mmrestripefs command to initiate data
migration from a suspended disk.
See “Deleting disks from a file system” on page 166, “Replacing disks in a GPFS file system” on page
168, and “Restriping a GPFS file system” on page 136.
Refer to “Displaying GPFS disk states” on page 170 for a detailed description of disk states. You can
change both the availability and status of a disk using the mmchdisk command:
v Change disk availability using the mmchdisk command and the stop and start options
v Change disk status using the mmchdisk command and the suspend and resume options.
Issue the mmchdisk command with one of the following four options to change disk state:
resume
Informs GPFS that a disk previously suspended is now available for allocating new space. Resume a
disk only when you suspended it and decided not to delete or replace it. If the disk is currently in a
stopped state, it remains stopped until you specify the start option. Otherwise, normal read and write
access to the disk resumes.
start
Informs GPFS that a disk previously stopped is now accessible. GPFS does this by first changing the
disk availability from down to recovering. The file system metadata is then scanned and any missing
updates (replicated data that was changed while the disk was down) are repaired. If this operation is
successful, the availability is then changed to up.
If the metadata scan fails, availability is set to unrecovered. This could occur if other disks remain in
recovering or an I/O error has occurred. Repair all disks and paths to disks. It is recommended that
you run mmfsck at this point (For more information, see mmfsck command in the IBM Spectrum Scale:
Command and Programming Reference). The metadata scan can then be re-initiated at a later time by
issuing the mmchdisk start command again.
Note: A disk remains suspended until it is explicitly resumed. Restarting GPFS or rebooting nodes
does not restore normal access to a suspended disk.
The empty option is similar to the suspend option. In GPFS 4.1.1 and earlier, the output of the
mmlsdisk command displays the status as suspended, as shown in the following example.
For example, to suspend the hd8vsdn100 disk in the file system fs1, enter:
mmchdisk fs1 suspend -d hd8vsdn100
You can also use the mmchdisk command with the change option to change the Disk Usage and Failure
Group parameters for one or more disks in a GPFS file system. This can be useful in situations where, for
example, a file system that contains only RAID disks is being upgraded to add conventional disks that
are better suited to storing metadata. After adding the disks using the mmadddisk command, the
metadata currently stored on the RAID disks would have to be moved to the new disks to achieve the
desired performance improvement. To accomplish this, first the mmchdisk change command would be
issued to change the Disk Usage parameter for the RAID disks to dataOnly. Then the mmrestripefs
command would be used to restripe the metadata off the RAID device and onto the conventional disks.
For complete usage information, see the mmchdisk command and the mmlsdisk command in the IBM
Spectrum Scale: Command and Programming Reference.
Once your NSDs have been created, you may change the configuration attributes as your system
requirements change. For more information about creating NSDs, see the IBM Spectrum Scale: Concepts,
Planning, and Installation Guide and the mmcrnsd command in the IBM Spectrum Scale: Command and
Programming Reference)
For example, to assign node k145n07 as an NSD server for disk gpfs47nsd:
1. Make sure that k145n07 is not already assigned to the server list by issuing the mmlsnsd command.
mmlsnsd -d "gpfs47nsd"
The useNSDserver file system mount option can be used to set the order of access used in disk
discovery, and limit or eliminate switching from local access to NSD server access, or the other way
around. This option is specified using the -o flag of the mmmount, mount, mmchfs, and mmremotefs
commands, and has one of these values:
always
Always access the disk using the NSD server.
asfound
Access the disk as found (the first time the disk was accessed). No change of disk access from
local to NSD server, or the other way around, is performed by GPFS.
asneeded
Access the disk any way possible. This is the default.
never Always use local disk access.
For example, to always use the NSD server when mounting file system fs1, issue this command:
mmmount fs1 -o useNSDserver=always
To change the disk discovery of a file system that is already mounted: cleanly unmount it, wait for the
unmount to complete, and then mount the file system using the desired -o useNSDserver option.
For fast recovery times with Persistent Reserve, you should also set the failureDetectionTime configuration
parameter. For fast recovery, a recommended value would be 10. You can set this by issuing the
command:
mmchconfig failureDetectionTime=10
To determine if the disks on the servers and the disks of a specific node have PR enabled, issue the
following command from the node:
mmlsnsd -X
If the GPFS daemon has been started on all the nodes in the cluster and the file system has been
mounted on all nodes that have direct access to the disks, then pr=yes should be on all hdisks. If you do
not see this, there is a problem. Refer to the IBM Spectrum Scale: Problem Determination Guide for
additional information on Persistent Reserve errors.
Prerequisites
When you enable SMB protocol services, the following prerequisites must be met:
v The number of CES nodes must be 16 or lower.
v All CES nodes must be running the same system architecture. For example, mixing nodes based on
Intel and Power is not supported.
| v A valid mmuserauth config
When you add new CES nodes to a running system where the SMB protocol is enabled, the following
prerequisite must be met:
| v All SMB packages (gpfs.smb) must have the same version
v All CES nodes must be in SMB HEALTHY state. You can verify the health status of the SMB service by
using the mmces state show smb command.
When you remove a CES node from a running system where the SMB protocol is enabled, the following
prerequisite must be met:
v All CES nodes (except for the node that is being removed) must be in SMB HEALTHY state.
For more information about the SMB states, see mmces command in IBM Spectrum Scale: Command and
Programming Reference.
Issue the following commands to enable SMB and NFS services on all CES nodes:
v mmces service enable SMB
v mmces service enable NFS
GUI navigation
v To enable SMB services in the GUI, log on to the IBM Spectrum Scale GUI and select Services > SMB.
v To enable NFS services in the GUI, log on to the IBM Spectrum Scale GUI and select Services > NFS.
The protocol services that are used need to be started on all CES nodes:
mmces service start SMB -a
mmces service start NFS -a
After you start the protocol services, verify that they are running by issuing the mmces state show
command.
Note: The start and stop are maintenance commands. Stopping a service on a particular protocol node
without first suspending the node ensures that the public IP addresses on that node stay with that node.
In this case, protocol clients that try to connect to the service with these IP addresses fail. The NFS service
might restart automatically after downtime if the process had shutdown unexpectedly.
Note: The sequence for removing the file data access method is different for NFS and SMB:
v For NFS, you must remove the file data access method before you disable NFS.
v For SMB, you cannot remove the file data access method while SMB is enabled and running.
6. Issue the following command to disable the NFS service on the CES nodes:
mmces service disable NFS
Important: When you disable NFS, the NFS configuration is lost. To save the NFS configuration, back
up the contents of the /var/mmfs/ces/nfs-config/ directory on any protocol node.
Note: If file audit logging is already enabled for the file system that you defined the Object fileset to
reside on, you need to disable and then enable file audit logging on that file system again.
mmaudit Device disable
As with most performance tuning, there is no single correct setting. A good starting point for tuning
worker counts is to set workers in the proxy-server.conf to auto so that one worker is started for every
core on a protocol node. The other servers can be set to a percentage of the number of cores on your
protocol nodes:
v object server set to 75% of core count
v container server set to 50% of core count
v account server set to 25% of core count
Depending on the load of other protocol workloads, the optimal settings for worker count might be
higher or lower than this on your system.
For example, if you have 16 cores in your protocol nodes, the following commands can be used to tune
your worker settings:
mmobj config change --ccrfile proxy-server.conf --section DEFAULT --property workers --value auto
mmobj config change --ccrfile object-server.conf --section DEFAULT --property workers --value 12
mmobj config change --ccrfile container-server.conf --section DEFAULT --property workers --value 8
mmobj config change --ccrfile account-server.conf --section DEFAULT --property workers --value 4
Prerequisites
All CES nodes must be running the Red Hat Enterprise Linux 7 and later.
The block service is provided as gpfs.scst rpm package. After the rpm installation, you need to compile
its Linux kernel modules by running mmbuildgpl command on each protocol node.
To continue to enable the BLOCK service, type Yes and press Enter.
After you start the BLOCK service, verify that it is running by running the mmces state show command.
Note: Start and stop are maintenance commands. Stopping the BLOCK service on a protocol node,
without first suspending the node means that the public IP addresses stay with that node even if the
BLOCK service is not available on that node. In this event, protocol clients might attempt to connect
using these IP addresses and fail to connect to the BLOCK service.
See the hardware and operating system documentation to configure the iSCSI initiator. The following
uses the Linux software iSCSI initiator running on the Red Hat Enterprise Linux 7 host.
Note: You can query CES IP by running the mmlscluster –ces command on the GPFS CES node.
iscsiadm -m discovery -t st -p 192.168.6.47
192.168.6.47:3260,1 iqn.1986-03.com.ibm:spectrumscale.192.168.6.47
4. Open an iSCSI session by logging in to the system:
iscsiadm -m node --targetname=iqn.1986-03.com.ibm:spectrumscale.192.168.6.47 --login
5. List the iSCSI devices:
lsscsi --scsi_id
[113:0:0:0] disk SCST_FIO volume1 302 /dev/sdz
To disable a protocol service, issue the mmces service disable command with the appropriate service
designation:
SMB Issue the following command:
mmces service disable SMB
Note: You can disable SMB only after you remove the authentication method or if the
authentication method is userdefined.
Important: Before you disable SMB protocol services, ensure that you save all the SMB
configuration information. Disabling SMB protocol services stops SMB on all CES nodes and
removes all configured SMB shares and SMB settings. It removes the SMB configuration
information from the CCR, removes the SMB clustered databases (trivial databases, or TBDs), and
removes the SMB-related config files in the /var/mmfs/ces directory on the CES nodes.
When you re-enable SMB, you must re-create and reconfigure all the SMB settings and exports.
NFS Issue the following command:
mmces service disable NFS
Before you disable NFS protocol services, ensure that you save all the NFS configuration
information by backing up the contents of the /var/mmfs/ces/nfs-config/ directory on any CES
node. Disabling the NFS service stops NFS on all CES nodes and removes the NFS configuration
information from the CCR and from the /var/mmfs/ces/nfs-config/ directory. Previous exports
are lost.
OBJ Issue the following command:
mmces service disable OBJ
Note: For more information on disabling the Object services, see “Understanding and managing
Object services” on page 245.
Client system authentication requirement: When you use GPFS clients or the NFS or SMB protocol to
access the files in an IBM Spectrum Scale file system, the authentication and ID mapping of users and
groups must be configured on the client operating system on which the file system or share is mounted.
You must configure the appropriate directory services (AD/LDAP/NIS) on that operating system, and
users and groups must be able to log in with their user IDs and group IDs. These are the actual
credentials that the file system will use to authenticate users and groups who try to access the file system
through the GPFS clients.
Depending on the requirement, the IBM Spectrum Scale system administrator needs to set up the
following servers:
v Microsoft Active Directory (AD) for file and object access
v Lightweight Directory Access Protocol server for file and object access
v Keystone server to configure local, AD, or LDAP-based authentication for object access. Configuring
Keystone is a mandatory requirement if you need to have Object access.
AD and LDAP servers are set up externally. You can configure either an internal or external Keystone
server. The installation and configuration of an external authentication server must be handled separately.
The IBM Spectrum Scale system installation manages the installation and set up of internal Keystone
server.
Ensure that you have the following details before you start configuring AD-based authentication:
v IP address or host name of the AD server.
v Domain details such as the following:
– Domain name and realm.
– AD admin user ID and password to join the IBM Spectrum Scale system as machine account into
the AD domain.
v ID map role of the system is identified.
v Define the ID map range and size depending upon the maximum RID (sum of allocated and expected
growth).
v Primary DNS is added in the /etc/resolv.conf file on all the protocol nodes. It resolves the
authentication server system with which the IBM Spectrum Scale system is configured. This is a
mandatory requirement when AD is used as the authentication server as the DNS must be able to
To achieve high-availability, you can configure multiple AD domain controllers. While configuring
AD-based authentication, you do not need to specify multiple AD servers in the command line to achieve
high-availability. The IBM Spectrum Scale system queries the specified AD server for relevant details and
configures itself for the AD-based authentication. The IBM Spectrum Scale system relies on the DNS
server to identify the set of available AD servers that are currently available in the environment serving
the same domain system.
Ensure that you have the following details before you start configuring LDAP based authentication:
v Domain details such as base dn, and dn prefixes of groups and users, else default values are used.
Default user group suffix is <ou=Groups, <base dn> and default user suffix is ou=People, <base dn>.
v IP address or host name of LDAP server.
v Admin user ID and password of LDAP server that is used during LDAP simple bind and for LDAP
searches.
v The secret key you provided for encrypting/decrypting passwords unless you have disabled
prompting for the key.
v NetBIOS name that is to be assigned for the IBM Spectrum Scale system.
v If you need to have secure communication between the IBM Spectrum Scale system and LDAP, the CA
signed certificate that is used by the LDAP server for TLS communication must be placed at the
specified location in the system.
v If you are using LDAP with Kerberos, create a Kerberos keytab file by using the MIT KDC
infrastructure.
v Primary DNS is added in the /etc/resolv.conf file on all the protocol nodes. It resolves the
authentication server system with which the IBM Spectrum Scale system is configured. The manual
changes done to the configuration files might get overwritten by the Operating System's network
manager. So, ensure that the DNS configuration is persistent even after you restart the system. For
more information on the circumstances where the configuration files are overwritten, refer the
corresponding Operating System documentation.
When an IBM Spectrum Scale system is configured with LDAP as the authentication method, the IBM
Spectrum Scale system needs to connect to the LDAP server by using an administrative user ID and
password. This administrative user is referred as bind user.
It is recommended that the bind user is given enough privileges that are required by the storage system
to mitigate any security concerns.
This bind user must at least have permission to query users and groups that are defined in the LDAP
server to allow storage system to authenticate these users. The bind user information (bind dn) is also
used by the Samba server while making LDAP queries to retrieve required information from the LDAP
server.
Note: In the following sections, it is assumed that the user account for the bind user exists in the LDAP
directory server. The bind user distinguished name (also known as dn) used in the following examples is
uid=ibmbinduser,ou=people,dc=ldapserver,dc=com. This name needs to be updated based on the bind
user that is used with the IBM Spectrum Scale system.
The OpenLDAP server ACLs define the privileges that are required for the bind user.
The following example uses ACLs that are required for the bind user and other type of users for the sake
of completeness. It is likely that a corporate directory server has those ACLs configured already and only
the entries for bind user need to be merged correctly in the slapd configuration file (generally,
/etc/openldap/slapd.conf file on Linux systems). Follow the ACL ordering rules to ensure that correct
ACLs are applied.
### some attributes need to be readable so that commands like ’id user’,
’getent’ etc can answer correctly.
access to attrs=cn,objectClass,entry,homeDirectory,uid,uidNumber,
gidNumber,memberUid
by dn="uid=ibmbinduser,ou=people,dc=ldapserver,dc=com" read
###The following will not list userPassword when ldapsearch is
performed with bind user.
### Anonymous is needed to allow bind to succeed and users to
authenticate, should be
a pre-existing entry already.
access to attrs=userPassword
by dn="uid=ibmbinduser,ou=people,dc=ldapserver,dc=com" auth
by self write
by anonymous auth
by * none
access to dn.regex="sambadomainname=[^,]+,dc=ldapserver,dc=com"
by dn=" uid=ibmbinduser,ou=people,dc=ldapserver,dc=com" read
by * none
The IBM Spectrum Protect Directory Server ACLs define the privileges that are required for the bind user,
when using IBM Spectrum Protect Directory Server.
These ACLs are provided in the LDIF format and can be applied by submitting the ldapmodify command.
dn: dc=ldapserver,dc=com
changetype: modify
add: ibm-filterAclEntry
ibm-filterAclEntry:access-id:uid=ibmbinduser,ou=people,dc=ldapserver,dc=com:
(objectClass=sambaSamAccount):normal:rsc:sensitive:rsc:critical:rsc
-
add:ibm-filterAclEntry
ibm-filterAclEntry:access-id:uid=ibmbinduser,ou=people,dc=ldapserver,dc=com:
(objectclass=sambaDomain):normal:rwsc:sensitive:rwsc:critical:rwsc
dn:uid=ibmbinduser,ou=people,dc=ldapserver,dc=com
add:aclEntry
aclentry: access-id:uid=ibmbinduser,ou=people,dc=ldapserver,dc=com:at.cn:r:at.
186 IBM Spectrum Scale 5.0.2: Administration Guide
objectClass:r:at.homeDirectory:r:at.uid:r:at.uidNumber:s:
at.gidNumber:r:at.memberUid:r:at.userPassword:sc:at.sambaLMPassword:r:at.
sambaNTPassword:r:at.sambaPwdLastSet:r:at.sambaLogonTime:r:
at.sambaLogoffTime:r:at.sambaKickoffTime:r:at.sambaPwdCanChange:r:at.
sambaPwdMustChange:r:at.sambaAcctFlags:r:at.displayName:r:
at.sambaHomePath:r:at.sambaHomeDrive:r:at.sambaLogonScript:r:at.sambaProfilePath:
r:at.description:r:at.sambaUserWorkstations:r:
at.sambaPrimaryGroupSID:r:at.sambaDomainName:r:at.sambaMungedDial:r:at.
sambaBadPasswordCount:r:at.sambaBadPasswordTime:r:
at.sambaPasswordHistory:r:at.sambaLogonHours:r:at.sambaSID:r:at.sambaSIDList:r:at.
sambaTrustFlags:r:at.sambaGroupType:r:
at.sambaNextRid:r:at.sambaNextGroupRid:r:at.sambaNextUserRid:r:at.
sambaAlgorithmicRidBase:r:at.sambaShareName:r:at.sambaOptionName:r:
at.sambaBoolOption:r:at.sambaIntegerOption:r:at.sambaStringOption:r:at.
sambaStringListoption:r:at.sambaBadPasswordCount:rwsc:
at.sambaBadPasswordTime:rwsc:at.sambaAcctFlags:rwsc
### Storage system needs to be able to find samba domain account specified
on the mmuserauth service create command.
###Uncomment ONLY if you want storage systems to create domain account when
it does not exist.
dn: dc=ldapserver,dc=com
changetype: modify
add:ibm-filterAclEntry
ibm-filterAclEntry:access-id:uid=ibmbinduser,ou=people,dc=ldapserver,
dc=com:(objectclass=domain):object:grant:a
See IBM Tivoli Directory Server Administration Guide for information about applying these ACLs on the
IBM Spectrum Protect Directory Server.
The following sample LDIF file shows the minimum required samba attributes:
dn: cn=SMBuser,ou=People,dc=ibm,dc=com
changetype: modify
add : objectClass
objectClass: sambaSamAccount
-
add: sambaSID
sambaSID: S-1-5-21-1528920847-3529959213-2931869277-1102
-
add:sambaPasswordHistory
Note: Attributes must be separated with a dash as the first and only character on a separate line.
Perform the following steps to create the values for sambaNTPassword, sambaPwdLastSet, and
SambaAcctFlags, which must be generated from a PERL module:
1. Download the module from http://search.cpan.org/~bjkuit/Crypt-SmbHash-0.12/SmbHash.pm.
Create and install the module by following the readme file.
2. Use the following PERL script to generate the LM and NT password hashes:
# cat /tmp/Crypt-SmbHash-0.12/gen_hash.pl
#!/usr/local/bin/perl
use Crypt::SmbHash;
$username = $ARGV[0];
$password = $ARGV[1];
if ( !$password ) {
print "Not enough arguments\n";
print "Usage: $0 username password\n";
exit 1;
}
$uid = (getpwnam($username))[2];
my ($login,undef,$uid) = getpwnam($ARGV[0]);
ntlmgen $password, $lm, $nt;
printf "%s:%d:%s:%s:[%-11s]:LCT-%08X\n", $login, $uid, $lm, $nt, "U", time;
3. Generate the password hashes for any user as in the following example for the user test01:
# perl gen_hash.pl SMBuser test01
:0:47F9DBCCD37D6B40AAD3B435B51404EE:82E6D500C194BA5B9716495691FB7DD6:
[U ]:LCT-4C18B9FC
Note: The output contains login name, uid, LM hash, NT hash, flags, and time, with each field
separated from the next by a colon. The login name and uid are omitted because the command was
not run on the LDAP server.
4. Use the information from step 3 to update the LDIF file in the format that is provided in the example
at the beginning of this topic.
v To generate the sambaPwdLastSet value, use the hexadecimal time value from step 3 after the dash
character and convert it into decimal.
v A valid samba SID is required for a user to enable that user’s access to an IBM Spectrum Scale
share. To generate the samba SID, multiply the user's UID by 2 and add 1000. The user's SID must
contain the samba SID from the sambaDomainName, which is either generated or picked up from the
LDAP server, if it exists. The following attributes for sambaDomainName LDIF entry are required:
dn: sambaDomainName=(IBM Spectrum Scale system),dc=ibm,dc=com
sambaDomainName: (IBM Spectrum Scale system name)
sambaSID: S-1-5-21-1528920847-3529959213-2931869277
sambaPwdHistoryLength: 0
sambaMaxPwdAge: -1
sambaMinPwdAge: 0
This entry can be created by the LDAP server administrator by using either of the following two
methods:
– Write and run a bash script similar to the following example:
Note: To enable access using the same LDAP server domain to more than one IBM Spectrum Scale
system or other IBM NAS like IBM SONAS or IBM V7000 Unified, the Samba domain SID prefix of
all of the systems must match. The Samba domain SID prefix is used to prepare the SID of
users/groups planning to access the IBM Spectrum Scale system via CIFS. So, if you change the
Samba domain SID for an IBM Spectrum Scale system on the LDAP server, you must restart the CES
Samba service on that IBM Spectrum Scale system for the change to take effect.
5. Submit the ldapmodify command as shown in the following example to update the user's information
:
# ldapmodify -h localhost -D cn=Manager,dc=ibm,dc=com -W -x -f /tmp/samba_user.ldif
Before you configure authentication for object, ensure that the object services are enabled. To enable object
services, use the mmces service enable obj command.
Prerequisites
Ensure that you have the following details before you start configuring local authentication for object
access:
v The keystone host name must be defined and configured on all protocol nodes of the cluster. This host
name returns one of the CES IP addresses, such as a round-robin DNS. It could also be a fixed IP of a
load balancer that distributes requests to one of the CES nodes. This host name is also used to create
the keystone endpoints.
Note: By default, the IBM Spectrum Scale installation process configures object authentication with a
local keystone authentication method.
You can configure the following external authentication servers for file access:
v Active Directory (AD)
v Light Weight Directory Access Protocol (LDAP)
v Network Information Service (NIS)
Before you configure the authentication method, ensure that the following RPMs are installed on all the
protocol nodes before you start configuring the authentication method:
Note: If you try to configure the file authentication method manually, with the mmuserauth cli
command, the command displays an error message if the required RPMs are not installed on the nodes.
The error output includes a list of nodes in which some RPMs are not installed and a list of the missing
RPMs for each node.
On Red Hat Enterprise Linux nodes
v For AD:
– bind-utils
– krb5-workstation
v For LDAP:
– openldap-clients
– sssd and its dependencies ( particularly sssd-common and sssd-ldap). It is a good idea to
install all the dependencies, as in the following example:
yum install sssd*
– krb5-workstation only if Kerberized authentication is planned.
v For NIS:
– sssd and its dependencies ( particularly sssd-common and sssd-proxy)
– ypbind and its dependencies (yp-tools)
On SLES nodes
v For AD:
– bind-utils
– krb5-client
v For LDAP:
– openldap2-client
– sssd and its dependencies ( particularly sssd-common, sssd-ldap, and sssd-krb5). It is a good
idea to install all the dependencies, as in the following example:
zypper install sssd*
– krb5-client only if Kerberized authentication is planned.
v For NIS:
– sssd and its dependencies ( particularly sssd-common and sssd-proxy)
– ypbind and its dependencies (yp-tools)
On Ubuntu 16 nodes
v For AD:
Note: It is not recommended to run for extended periods of time at log levels higher than 1 as this
could impact performance.
You can configure AD-based authentication with the following ID mapping methods:
v RFC2307
v Automatic
v LDAP
RFC2307 ID mapping
In the RFC2307 ID mapping method, the user and group IDs are stored and managed in the AD server
and these IDs are used by the IBM Spectrum Scale system during file access. The RFC2307 ID mapping
method is used when you want to have multiprotocol access. That is, you can have both NFS and SMB
access over the same data.
Automatic ID mapping
In the automatic ID mapping method, user ID and group ID are automatically generated and stored
within the IBM Spectrum Scale system. When an external ID mapping server is not present in the
environment or cannot be used, then this ID mapping method can be used. This method is typically used
if you have SMB only access and do not plan to deploy multiprotocol access. That is, the AD-based
authentication with automatic ID mapping is not used if you need to allow NFS and SMB access to the
same data.
LDAP ID mapping
In the LDAP mapping method, user ID and group ID are stored and managed in the LDAP server, and
these IDs are used by the IBM Spectrum Scale system during file access. The LDAP ID mapping method
is used when you want to have multiprotocol access. That is, you can have both NFS and SMB access
over the same data.
The ID map range is defined between a minimum and maximum value. The default minimum value is
10000000 and the default maximum value is 299999999, and the default range size is 1000000. This allows
for a maximum of 290 unique Active Directory domains.
The ID map range size specifies the total number of UIDs and GIDs that are assignable per domain. For
example, if range is defined as 10000-20000, and range size is defined as 2000 (--idmap-range
10000-20000:2000), five domains can be mapped, each consisting of 2000 IDs. Ensure that range size is
defined such that at least three domains can be mapped. The range size is identical for all AD domains
that are configured by the IBM Spectrum Scale system. Choose an ID map range size that allows for the
highest anticipated RID value among all of the anticipated AD users and group of users in all of the
anticipated AD domains. Ensure that the range size value, when originally defined, takes into account the
planned growth in the number of AD users and groups of users. The ID map range size cannot be
changed after the IBM Spectrum Scale system is configured with Active Directory as the authentication
server.
To change the ID mapping of an existing AD-based authentication configuration, either to change the
range minimum value, decrease the range maximum value, or change the range size, you must complete
the following steps:
Important: If you do not perform the preceding three steps in sequence, results are unpredictable and
can include complete loss of data access.
You need to run the mmuserauth service create command with the following mandatory parameters to
create AD based authentication for file access:
v --type ad
v --data-access-method file
v --servers <server host name or IP address>
v --netbios-name <netBiosName>
v --user-name <admin-username>
v --unixmap-domains <unixDomainMap>. This option is mandatory if RFC2307 ID mapping is used. For
example, --unixmap-domains DOMAINS(5000-20000). Specifies the Active Directory domains for which
user ID and group ID should be fetched from the Active directory server ( RFC2307 schema attributes )
v --idmap-role master | subordinate. While using automatic ID mapping, in order to have same ID
maps on systems sharing Active File Manager (AFM) relationship, you need to export the ID mappings
from the system whose ID map role is master to the system whose ID map role is subordinate.
See the mmuserauth service create command for more information on each parameter.
Note: The primary Windows group that is assigned to an AD user must have a valid GID assigned
within the specified ID mapping range. The primary Windows group is usually located in the Member Of
tab in the user's properties. The primary Windows group is different from the UNIX primary group,
which is listed in the UNIX Attributes tab. A user is denied access if that user’s Windows primary group
does not have a valid GID assigned. The UNIX primary group attribute is ignored.
In case of a mutual trust setup between two independent AD domains, DNS forwarding must be
configured between the two trust.
The following provides an example of how to configure an IBM Spectrum Scale system with Active
Directory and automatic ID mapping.
You can configure IBM Spectrum Scale system authentication with Active Directory (AD) and RFC2307 ID
mapping or Active Directory (AD) with Kerberos NFS and RFC2307 ID mapping. In these authentication
methods, use Active Directory to store user credentials and RFC2307 attributes on the same AD server to
The following provides an example of how to configure the IBM Spectrum Scale system with Active
Directory and RFC2307 ID mapping for picking UNIX primary group:
1. Submit the mmuserauth service create command as shown in the following example:
# mmuserauth service create --type ad --data-access-method file --netbios-name ess
--user-name administrator --idmap-role master --servers myADserver --idmap-range-size 1000000
--idmap-range 10000000-299999999 --unixmap-domains ’DOMAIN(5000-20000:unix)’
It is recommended to adhere to the following best practices if you configure AD with RFC2307 as the
authentication method:
v Remove any internal ID mappings present in the system before you configure AD with RFC2307.
Otherwise, the system might detect the internal ID mappings instead of the RFC2307 ID mapping and
abort the operation with an error message. In such situations, you are expected to clean up the entire
authentication and ID mapping by using the mmuserauth service remove and mmuserauth service
remove --idmapdelete command and then reconfigure AD authentication and RFC2307 ID mapping.
v If data is already present on the system, a complete removal of the authentication and ID mapping can
cause permanent loss of data access.
v Using UIDs and GIDs greater than 1000 can avoid an overlap of IDs used by end users, administrative
users, and operating system component users of the IBM Spectrum Scale system.
You can use AD-based authentication and RFC2307 ID mapping if you want to use the AFM feature of
the IBM Spectrum Scale system.
Limitations of the mmuserauth service create command while configuring AD with RFC2307:
The mmuserauth service create command that is used to configure authentication has the following
limitations:
v The mmuserauth service create command does not check the two-way trust between the host domain
and the RFC2307 domain that is required for ID mapping services to function properly. The customer
is responsible for configuring the two-way trust relationship between these domains.
v The customer is responsible for installing RFC2307 on the desired AD server, and for assigning UIDs to
users and GIDs to groups. The command does not return an error if RFC2307 is not installed, or if a
UID or GID is not assigned.
Note: The bind_dn_pwd cannot contain the following special characters: semicolon (;), colon (:),
opening brace '( ', or closing brace ')'.
The system displays the following output:
File authentication configuration completed successfully.
2. Issue the mmuserauth service list to verify the authentication configuration as shown in the
following example:
# mmuserauth service list
Using LDAP with TLS secures the communication between the IBM Spectrum Scale system and the
LDAP server, assuming that the LDAP server is configured for TLS.
The LDAP server might need to handle the login requests and ID mapping requests from the client that
uses SMB protocol. Usually, the ID mapping requests are cached and they do not contribute to the load
on the LDAP server unless the ID mapping cache is cleared due to a maintenance action. If the LDAP
server cannot handle the load or a high number of connections, then the response to the login requests is
slow or it might time out. In such cases, users need to retry their login requests.
It is assumed that LDAP server is set up with the required schemas installed in it to handle the
authentication and ID mapping requests. If you need to support SMB data access, LDAP schema must be
extended to enable storing of additional attributes such as SID, Windows password hash to the POSIX
user object.
Note: The IBM Spectrum Scale system must not be configured with any authentication method before
using LDAP as the authentication system for file access.
See “Integrating with LDAP server” on page 184 for more information on the prerequisites for integrating
LDAP server with the IBM Spectrum Scale system.
In the following example, LDAP is configured with TLS as the authentication method for file access.
1. Ensure that the CA certificate for LDAP server is placed under /var/mmfs/tmp directory with the name
ldap_cacert.pem; specifically, on the protocol node where the command is run. Perform validation of
CA cert availability with desired name at required location as shown in the following example:
# stat /var/mmfs/tmp/ldap_cacert.pem
File: /var/mmfs/tmp/ldap_cacert.pem
Size: 2130 Blocks: 8 IO Block: 4096 regular file
Device: fd00h/64768d Inode: 103169903 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Context: unconfined_u:object_r:user_tmp_t:s0
Access: 2015-01-23 12:37:34.088837381 +0530
Modify: 2015-01-23 12:16:24.438837381 +0530
Change: 2015-01-23 12:16:24.438837381 +0530
2. Issue the mmuserauth service create command as shown in the following example:
# mmuserauth service create --type ldap --data-access-method file
--servers myLDAPserver --base-dn dc=example,dc=com
--user-name cn=manager,dc=example,dc=com
--netbios-name ess --enable-server-tls
Example for configuring LDAP with Kerberos as the authentication method for file access.
1. Ensure that the keytab file is also placed under the /var/mmfs/tmp directory with the name as
krb5.keytab on the node where the command is run. Perform validation of keytab file availability
with desired name at required location:
# stat /var/mmfs/tmp/krb5.keytab
File: /var/mmfs/tmp/krb5.keytab
Size: 502 Blocks: 8 IO Block: 4096 regular file
Device: fd00h/64768d Inode: 103169898 Links: 1
Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root)
Context: unconfined_u:object_r:user_tmp_t:s0
Access: 2015-01-23 14:31:18.244837381 +0530
Modify: 2015-01-23 12:45:05.475837381 +0530
Change: 2015-01-23 12:45:05.476837381 +0530
Birth: -
2. Issue the mmuserauth service create command as shown in the following example:
# mmuserauth service create --type ldap --data-access-method file
--servers myLDAPserver --base-dn dc=example,dc=com
--user-name cn=manager,dc=example,dc=com
--netbios-name ess --enable-kerberos --kerberos-server myKerberosServer
--kerberos-realm example.com
Provides an example on how to configure LDAP with TLS and Kerberos as the authentication method for
file access.
1. Ensure that the CA certificate for LDAP server is placed under /var/mmfs/tmp directory with the name
ldap_cacert.pem; specifically, on the protocol node where the command is run. Perform validation of
CA cert availability with desired name at the required location as shown in the following example:
# stat /var/mmfs/tmp/ldap_cacert.pem
File: /var/mmfs/tmp/ldap_cacert.pem
Size: 2130 Blocks: 8 IO Block: 4096 regular file
Device: fd00h/64768d Inode: 103169903 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Context: unconfined_u:object_r:user_tmp_t:s0
Access: 2015-01-23 12:37:34.088837381 +0530
Modify: 2015-01-23 12:16:24.438837381 +0530
Change: 2015-01-23 12:16:24.438837381 +0530
2. Ensure that the keytab file is placed under /var/mmfs/tmp directory name as krb5.keytab specifically
on the node where the command is run. Perform validation of keytab file availability with desired
name at the required location:
# stat /var/mmfs/tmp/krb5.keytab
File: /var/mmfs/tmp/krb5.keytab
Size: 502 Blocks: 8 IO Block: 4096 regular file
Device: fd00h/64768d Inode: 103169898 Links: 1
Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root)
Context: unconfined_u:object_r:user_tmp_t:s0
Access: 2015-01-23 14:31:18.244837381 +0530
Modify: 2015-01-23 12:45:05.475837381 +0530
Change: 2015-01-23 12:45:05.476837381 +0530
Birth: -
3. Issue the mmuserauth service create command as shown in the following example:
# mmuserauth service create --type ldap --data-access-method file
--servers myLDAPserver --base-dn dc=example,dc=com
--user-name cn=manager,dc=example,dc=com
--netbios-name ess --enable-server-tls --enable-kerberos
--kerberos-server myKerberosServer --kerberos-realm example.com
Provides an example on how to configure LDAP without TLS and Kerberos as the authentication method
for file access.
1. Issue the mmuserauth service create command as shown in the following example:
# mmuserauth service create --type ldap --data-access-method file
--servers 192.0.2.18 --base-dn dc=example,dc=com
--user-name cn=manager,dc=example,dc=com --netbios-name ess
Ensure that you have the following details before you start NIS-based authentication:
v NIS domain name. This is case-specific
v IP address or host name of the NIS server
v Primary DNS is added in the /etc/resolv.conf file on all the protocol nodes. It resolves the
authentication server system with which the IBM Spectrum Scale system is configured. The manual
changes done to the configuration files might get overwritten by the Operating System's network
manager. So, ensure that the DNS configuration is persistent even after you restart the system. For
more information on the circumstances where the configuration files are overwritten, refer the
corresponding Operating System documentation.
You need to run the mmuserauth service create command with the following mandatory parameters to
configure NIS as the authentication method:
v --type nis
v --data-access-method file
v --domain domainName
v --servers comma-delimited IP address or host name
For more information on each parameter, see the mmuserauth service create command.
Provides an example on how to configure NIS as the authentication method for file access.
1. Issue the mmuserauth service create command as shown in the following example:
# mmuserauth service create --type nis --data-access-method file
--servers myNISserver --domain nisdomain3
Note: The ID map configuration file and ID mapping service can differ on various OS platforms.
General requirements
v For Kerberized NFS access, time must be synchronized across the KDC server, the IBM Spectrum Scale
cluster protocol nodes, and the NFS clients. Otherwise, access to an NFS export might be denied.
v For Kerberized NFSv3 access, NFS clients should mount NFS exports by using one of the configured
CES IP addresses.
v For Kerberized NFSv4 access, NFS clients can mount NFS exports by using either "one of the
configured CES IP addresses" or the "system account name" that is configured for FILE protocols
authentication. The "system account name" is the value that is specified for the --netbios-name option
in the mmuserauth CLI command during FILE protocols authentication configuration.
The IBM Spectrum Scale system administrators are not allowed use any of the GPFS commands to
manage authentication. It is important for the end user to be aware of the limitations, if any, of the
authentication and ID mapping scheme that will be implemented after configuring the user-defined mode
of authentication.
Note: If the end user wants to configure the authentication methods that are supported by the IBM
Spectrum Scale system, it is highly recommended to configure the authentication and ID mapping
methods by using the mmuserauth command instead of opting for the user-defined method of
authentication.
The IBM Spectrum Scale system administrator needs to specify that the user-defined mode of
authentication is used by using the --type userdefined option in the mmuserauth service create
command as shown in the following example:
# mmuserauth service create --type userdefined --data-access-method file
File Authentication configuration completed successfully.
Typically, user-defined authentication is used when existing GPFS customers are already using GPFS with
NFS and do not want to alter the authentication that is already configured on these systems. You can
configure user-defined authentication for both object and file access or for object or file alone.
Note: Authorization depends upon authentication and ID mapping that is configured with the system.
That is, the ACL control on exports, files, and directories depend on the authentication method that is
configured.
Ensure the following while using the user-defined mode of authentication for file access:
v Ensure that the authentication server and ID mapping server are always reachable from all the protocol
nodes. For example, if NIS is configured as the ID mapping server, you can use the 'ypwhich'
command to ensure that NIS is configured and reachable from all the protocol nodes. Similarly, if
LDAP is configured as authentication and ID mapping server, you can bind to the LDAP server from
all protocol nodes to monitor if the LDAP server is reachable from all protocol nodes.
v Ensure that the implemented authentication and ID mapping configuration is always consistent across
all the protocol nodes. This requires that the authentication server and ID mapping server are manually
maintained and monitored by the administrator. The administrator must also ensure that the
configuration files are not overwritten due to node restart and other similar events.
v Ensure that the implemented authentication and ID mapping-related daemons and processes across the
protocol nodes are always up and running.
v The users or groups, accessing the IBM Spectrum Scale system over NFS and SMB protocols must
resolve to a unique UID and GID respectively on all protocol nodes, especially in implementations
where different servers are used for authentication and ID mapping. The name that is registered in ID
mapping server for user and group must be checked for resolution.
For example:
# id fileuser
uid=1234(fileuser) gid=5678(filegroup) groups=5678(filegroup)
Note: However, there are some use cases where only NFSV3 based access to the IBM Spectrum Scale
system is used. In such cases, the user and group IDs are obtained from the NFS client and there is no
ID mapping setting is configured on the protocol nodes.
v If the IBM Spectrum Scale system is configured for multiprotocol support (that is, the same data is
accessed through both NFS and SMB protocols), ensure that the IDs of users and groups are consistent
across the NFS clients and SMB clients and that they resolve uniquely on the protocol nodes.
v Ensure that there is no conflict of UID and GID across users and groups that are accessing the system.
This must be strictly enforced, especially in multiprotocol-based access deployments.
v Ensure that the Kerberos configuration files, placed on all protocol nodes, are in synchronization with
each other. Ensure that the clients and the IBM Spectrum Scale system are part of the same Kerberos
realm or trusted realm.
v While deploying two or more IBM Spectrum Scale clusters, ensure that the ID mapping is consistent in
cases where you want to use IBM Spectrum Scale features like AFM, AFM-DR, and asynchronous
replication of data.
The user-defined mode for object authentication integrates IBM Spectrum Scale Object Storage with the
externally hosted keystone server. Ensure the following while using the user-defined mode of
authentication for object access:
v Integration with external keystone server is supported over http and https.
v The specified object user must be defined while enabling and configuring object in the external
keystone server.
v The 'service' tenant/project must be defined in the external keystone server.
v The 'admin' role must be defined in the external keystone server.
v Ensure that the specified swift user has 'admin' role in 'service' tenant/project.
For example, the external keystone server must contain the following admin role definition to sift user:
# openstack role list --user swift --project service
+----------------------------------+-------+---------+-------+
| ID | Name | Project | User |
+----------------------------------+-------+---------+-------+
| 90877d1913964e1eac05031e45afb46a | admin | service | swift |
+----------------------------------+-------+---------+-------+
v The users and projects must be mapped to the Default domain in Keystone.
v Object storage service endpoints must be correctly defined in the external keystone server.
For example, the external keystone server must contain the following endpoint for object-store:
# openstack endpoint list
+------------+--------+--------------+--------------+---------+-----------+--------------------------------------------------------------|
| ID | Region | Service Name | Service Type | Enabled | Interface | URL |
+------------+--------+--------------+--------------+---------+-----------+--------------------------------------------------------------|
| c36e..9da5 | None | keystone | identity | True | public | http://specscaleswift.example.com:5000/ |
| f4d6..b040 | None | keystone | identity | True | internal | http://specscaleswift.example.com:35357/ |
| d390..0bf6 | None | keystone | identity | True | admin | http://specscaleswift.example.com:35357/ |
| 2e63..f023 | None | swift | object-store | True | public | http://specscaleswift.example.com:8080/v1/AUTH_%(tenant_id)s |
| cd37..9597 | None | swift | object-store | True | internal | http://specscaleswift.example.com:8080/v1/AUTH_%(tenant_id)s |
| a349..58ef | None | swift | object-store | True | admin | http://specscaleswift.example.com:8080 |
+------------+--------+--------------+--------------+---------+-----------+--------------------------------------------------------------|
If the object authentication is set to 'user-defined' and an IP address/port number is set in the proxy
server configuration for keystone authentication, then that IP address will be checked using a simple
http(s) request. If the request fails, then the AUTH_OBJ state will be set to "degraded" and an 'external
keystone URL failure' event will be logged. This will not cause the node to be flagged as bad nor will it
cause any public IP movement.
v Issue the following command:
mmces state show
You can use the following authentication methods for object access:
v Active Directory (AD)
v LDAP
v Local authentication
v User-defined (external keystone)
The AD-based and LDAP-based authentication methods use an external AD and LDAP server
respectively to manage the authentication. Local authentication is handled by a Keystone server that
resides within the IBM Spectrum Scale system.
The IBM Spectrum Scale system installation process configures Keystone server that is required for object
access. By default the IBM Spectrum Scale installation process configures object authentication with a
local Keystone authentication method. If you have an existing Keystone server that you want to use,
specify that it be used for authentication.
Before you configure object authentication method, ensure that the Keystone Identity service is properly
configured.
Note: Before you configure an authentication method for object access, ensure that all protocol nodes
have CES IP addresses assigned and you are issuing the authentication configuration command from the
protocol node that has one or more CES IP addresses assigned to it.
Before you start manually configuring authentication method for object access, ensure that the
openldap-clients RPM is installed.
On each protocol node, issue the following command: yum install openldap-clients.
Note: This step is required only when the authentication type is AD/LDAP.
The mapping between user, role, and tenant is stored in the Keystone database. If you switch from one
authentication type to another you must delete the existing mapping definitions by issuing the following
command:
mmuserauth service remove --data-access-method object --idmapdelete
Note:
It is recommended to run the mmuserauth service check command as follows after configuring object
authentication using the mmuserauth service create command:
mmuserauth service check --data-access-method object -N cesNodes
If the mmuserauth service check command reports that any certificate file is missing on any of the nodes,
then run the following command:
mmuserauth service check --data-access-method object -N cesNodes --rectify
For more information about mmuserauth service check, see the topic mmuserauth command in the IBM
Spectrum Scale: Command and Programming Reference.
The local authentication method is useful when you want to create and maintain a separate set of users
for only object access. These users cannot use the local authentication credentials for accessing file data
that is hosted through NFS and SMB protocols. If you want to allow a user to access both file and object,
use an external authentication server such as AD or LDAP to manage user accounts and authentication
requests.
Note: File and object authentication must be configured with individual invocations of the mmuserauth
command, even if the authentication server is the same.
You need to use the mmuserauth service create command with the following mandatory parameters to
configure local authentication for object access:
v --type local
v --data-access-method object
v --ks-admin-user keystoneAdminName
For more information on each parameter, see the mmuserauth service create command.
1. To configure local authentication for object access, issue mmuserauth service create command as
shown in the following example:
# mmuserauth service create --data-access-method object --type local
--ks-dns-name c40bbc2xn3 --ks-admin-user admin
Note:
Local authentication is now configured for object access with SSL enabled.
To disable SSL and configure local authentication for object access again, use the following steps.
4. Remove existing local authentication for object access as follows.
mmuserauth service remove --data-access-method object
If you are also changing authentication type, remove authentication and ID mappings by using the
following commands in sequence.
mmuserauth service remove --data-access-method object
mmuserauth service remove --data-access-method object --idmapdelete
5. Configure local authentication without SSL for object access as follows.
mmuserauth service create --data-access-method object --type local
The AD server is set up to handle the authentication requests. AD is used as an LDAP server. Unlike file
access, multiple AD domains are not supported.
Prerequisites
Ensure that you have the following details before you start AD-based authentication configuration:
v AD server details such as IP address or host name, user name, user password, base dn, and user dn.
v If you want to configure TLS with AD for secure communication between Keystone and AD, you need
to place the CA certificate that is used for signing the AD server setup for TLS under the following
directory of the node on which the mmuserauth service create command is run:
– /var/mmfs/tmp/ldap_cacert.pem
v The secret key you provided for encrypting/decrypting passwords unless you have disabled
prompting for the key.
See “Integrating with AD server” on page 183 for more information on the prerequisites for integrating
AD server with the IBM Spectrum Scale system.
The following parameters must be used with mmuserauth service create command to configure
AD-based authentication for object access:
v --type ad
v --data-access-method object
v --servers IP address or host name of AD. All user lookups by Keystone are done only against this
server. If multiple servers are specified, only the first server is used and the rest are ignored.
v --base-dn ldapBase
v [--pwd-file PasswordFile] --user-name | --enable-anonymous-bind - to enter password from the
stanza file or enable anonymous binding with authentication server.
For more information on each parameter, see the mmuserauth service create command.
To change the authentication method that is already configured for object access, you need to remove the
authentication method and ID mappings. For more information, see “Deleting the authentication and the
ID mapping configuration” on page 223.
Prerequisites
Ensure that you have the following details before you configure LDAP-based authentication:
v LDAP server details such as IP address or host name, LDAP user name, user password, base dn, and
user dn.
v If you want to configure TLS with LDAP for secure communication between Keystone and LDAP, you
need to place the CA certificate that is used for signing the LDAP server setup for TLS under the
following directory of the node on which the mmuserauth service create command is run:
– /var/mmfs/tmp/ldap_cacert.pem
v The secret key you provided for encrypting/decrypting passwords unless you have disabled
prompting for the key.
See “Integrating with LDAP server” on page 184 for more information on the prerequisites for integrating
LDAP server with the IBM Spectrum Scale system.
You need to issue the mmuserauth service create command to configure LDAP-based authentication
with the following parameters:
v --type ldap
v --data-access-method object
v --servers IP address or host name of LDAP (all user lookups by Keystone is done only against this
server. If multiple servers are specified, only the first server is used and rest are ignored).
v --base-dn ldapBase
v [--pwd-file PasswordFile] --user-name | --enable-anonymous-bind - to enter password from the
stanza file or enable anonymous binding with authentication server.
v --enable-server-tls, if TLS needs to be enabled.
v --user-dn ldapUserSuffix (LDAP container from where users are looked up)
v --ks-admin-user keystoneAdminUser from LDAP.
v --enable-ks-ssl, if SSL needs to be enabled. You need to have another set of certificates that are
placed in the standard directory.
v --enable-ks-casigning, if you want to use external CA signed certificate for token signing.
v --ks-swift-user swiftServiceUser from LDAP.
For more information on each parameter, see the mmuserauth service create command.
To change the authentication method that is already configured for object access, you need to remove the
authentication method and ID mappings. For more information, see “Deleting the authentication and the
ID mapping configuration” on page 223.
The following prerequisites must be met before you start configuring an external keystone server with the
IBM Spectrum Scale system.
v The external keystone server must be running and reachable from all protocol nodes.
v The keystone server administrator must create an object storage service for the required user, for object
authentication configuration.
To configure an external keystone server with the IBM Spectrum Scale system, enter the mmuserauth
service create command as shown in the following example:
v
mmuserauth service create --data-access-method object --type userdefined
--ks-ext-endpoint http://specscaleswift.example.com:35357/v3
--ks-swift-user swift
Configuring IBM Spectrum Scale for object storage with SSL-enabled external
keystone
1. Remove the object authentication along with the ID mapping ID if it is present by running one of the
following commands:
mmuserauth service remove --data-access-method object
mmuserauth service remove --data-access-method object --idmapdelete
2. Copy the CA certificate with the external keystone to the node where the mmuserauth command is
being run in directory /var/mmfs/tmp, for example:
/var/mmfs/tmp/ks_ext_cacert.pem
3. Configure the object authentication by running the mmuserauth service create command with the
--enable-ks-ssl option:
mmuserauth service create --data-access-method object --type userdefined
--ks-ext-endpoint https://specscaleswift.example.com:35357/v3
--ks-swift-user swift --enable-ks-ssl
You must create at least one account before adding users. An account contains a list of containers in the
object storage. You can also define quota at the account level. An object account represents a storage
location for a project rather than a specific user.
Note:
To work with this function in the IBM Spectrum Scale GUI, log on to the GUI and select Object >
Accounts.
1. To view the details for an existing account, issue the swift stat command:
swift stat --os-auth-url http://tully-ces-ip.adcons.spectrum:35357/v3 \
--os-project-name admin \
--os-project-domain-name Default \
--os-username admin \
--os-user-domain-name Default \
--os-password Passw0rd \
--auth-version 3
or
source openrc
swift stat
The system displays output similar to the following:
Account: AUTH_bea5a0c632e54eaf85e9150a16c443cet
Containers: 0
Objects: 0
Bytes: 0
X-Put-Timestamp: 1489046102.20607
X-Timestamp: 1489046102.20607
X-Trans-Id: tx73c9382f200d4bd88d866-0058c10a55
Content-Type: text/plain; charset=utf-8
Note: To avoid specifying the option to the swift command, you can use the ~/openrc file from the
protocol node. The swift command looks like:
source ~/openrc
swift stat
2. To create a new account, do the following steps:
a. Use the openstack project create command to create a project.
For example, create the project 'salesproject' in the Default domain using the command:
# openstack project create salesproject --domain Default
The system displays output similar to the following:
+-------------+----------------------------------+
| Field | Value |
+-------------+----------------------------------+
| description | Description is displayed here. |
| domain_id | default |
| enabled | True |
| id | ec4a0bff137b4c1fb67c6fe8fbb6a37b |
| is_domain | False |
You can use an external Microsoft Active Directory or LDAP server or a local database as the back-end to
store and manage user credentials for user authentication. The authorization details such as relation of
users with projects and roles are maintained locally by the keystone server. The customer can select the
authentication server to be used. For example, if AD is already configured in the environment and the
users who need access to the object store are part of AD, then the customer can configure Keystone with
AD as the authentication and authorization back-end.
When the back-end authentication server is AD or LDAP, the user management operations such as
creating or deleting a user are the responsibility of the AD/LDAP administrator, who can optionally also
be the Keystone server administrator. When local authentication is used for object access, the user
management operations are done by the Keystone administrator. In case of authorization, the
management tasks such as creating roles, projects, and associating the user with them is done by the
Keystone Administrator. The Keystone administration can be done through the Keystone V3 REST API or
by using an OpenStack python-based client.
Before you start creating object users, and projects, ensure that Keystone server is configured and the
authentication servers are set up properly.
Note:
v If the cluster is reachable from the system, the OpenStack command can be issued from any system.
v If the OpenStack command is run from any of the protocol nodes, then you can use the openrc file to
set the required environment that is used by OpenStack commands to manage the Keystone server. The
advantage of using the openrc file is that you are not required to enter the following details every time
you enter the commands: --os-identity-api-version, --os-username, --os-password,
--os-project-domain-name, --os-user-domain-name, --os-domain-id, and --os-auth-url.
v The user create, update, and delete operations are only applicable when local authentication method is
used for object access.
When creating a new user in the local database to support local authentication for object access, activate
the openrc file located under /root/openrc by default. You can load the openrc profile by running:
'source /root/openrc' and this will automatically load the required environmental variables into your
current location. The results will look similar to this:
export OS_AUTH_URL="http://cesobjnode:35357/v3"
export OS_IDENTITY_API_VERSION=3
export OS_AUTH_VERSION=3
export OS_USERNAME="admin"
export OS_PASSWORD="Passw0rd"
export OS_USER_DOMAIN_NAME=Default
export OS_PROJECT_NAME=admin
export OS_PROJECT_DOMAIN_NAME=Default
Use the openstack user create command and manually enter the parameters as shown in the following
example to create new user in the local database to support local authentication for object access.
# openstack --os-identity-api-version 3 --os-username admin --os-password
Passw0rd --os-project-domain-name Default --os-user-domain-name Default --osdomain-
id default --os-auth-url http://specscaleswift.example.com:35357/v3 user create --password-prompt
--email [email protected] --domain default newuser1
User Password:
Repeat User Password:
+-----------+------------------------------------------------------------------------------+
| Field | Value |
+-----------+------------------------------------------------------------------------------+
| domain_id | default |
| email | [email protected] |
| enabled | True |
| id | 2a3ef8031359457292274bcd70e34d00 |
| name | newuser1 |
+-----------+------------------------------------------------------------------------------+
GUI navigation
To work with this function in the IBM Spectrum Scale GUI, log on to the GUI and select Object > Users.
Listing users
Use the openstack user list command as shown in the following example to list users who are created
in the local database:
# source $HOME/openrc
# openstack user list
+----------------------------------+----------+
| ID | Name |
+----------------------------------+----------+
| 2a3ef8031359457292274bcd70e34d00 | newuser1 |
| a95783144edd414aa236a3d1582a3067 | admin |
+----------------------------------+----------+
Use the openstack user set command to update the object user details. The following example shows
how to change the password:
# openstack user set --password Passw0rd newuser2
Use the openstack user delete command as shown in the following example to delete the users who are
created in the local database:
# openstack user delete newuser2
GUI navigation
To work with this function in the IBM Spectrum Scale GUI, log on to the GUI and select Object > Roles.
Perform the following steps to create a new project and add a user to the project with a specified role:
1. Submit the openstack project create command to create a new project:
# openstack project create newproject
+-------------+--------------------------------------------------------------------------+
| Field | Value |
+-------------+--------------------------------------------------------------------------+
| description | |
| domain_id | default |
| enabled | True |
| id | 2dfcbdb70b75435fb2015c86d46ffc0b |
| is_domain | False |
| name | newproject |
| parent_id | None |
+-------------+--------------------------------------------------------------------------+
2. Submit the openstack role add command to add a role to the user as shown in the following
example:
# openstack role add --user newuser1 --
project newproject member
Listing endpoints
Use the openstack endpoint list command as shown in the following example to view the endpoints
that are available:
# openstack endpoint list
+------------+-----------+--------------+--------------+---------+-----------+---------------------------------------------------------------|
| ID | Region | Service Name | Service Type | Enabled | Interface | URL |
+------------+-----------+--------------+--------------+---------+-----------+---------------------------------------------------------------|
| c36e..9da5 | RegionOne | keystone | identity | True | public | http://specscaleswift.example.com:5000 |
| f4d6..b040 | RegionOne | keystone | identity | True | internal | http://specscaleswift.example.com:35357 |
| d390..0bf6 | RegionOne | keystone | identity | True | admin | http://specscaleswift.example.com:35357 |
| 2e63..f023 | RegionOne | swift | object-store | True | public | http://specscaleswift.example.com:8080/v1/AUTH_%(tenant_id)s |
| cd37..9597 | RegionOne | swift | object-store | True | internal | http://specscaleswift.example.com:8080/v1/AUTH_%(tenant_id)s |
| a349..58ef | RegionOne | swift | object-store | True | admin | http://specscaleswift.example.com:8080 |
+------------+--------+--------------+--------------+---------+-----------+------------------------------------------------------------------|
Use cron as follows to configure a periodic task on one of the protocol nodes that purges expired tokens
hourly or based on the load in your environment.
# (crontab -l -u keystone 2>&1 | grep -q token_flush) || \
echo ’@hourly /usr/bin/keystone-manage token_flush >/var/log/keystone/keystone-tokenflush.log 2>&1’ \
>> /var/spool/cron/keystone
Note: You are not allowed to delete both the authentication configuration and the ID mappings at the
same time. You need to remove the authentication configuration first and then the ID maps. The system
does not allow you to delete the ID maps without deleting the authentication configuration.
1. Issue the mmuserauth service list command to see the authentication method that is configured in
the system:
# mmuserauth service list
FILE access configuration: LDAP
PARAMETERS VALUES
-------------------------------------------------
ENABLE_ANONYMOUS_BIND false
ENABLE_SERVER_TLS false
ENABLE_KERBEROS false
USER_NAME cn=manager,dc=example,dc=com
SERVERS 10.0.100.121
NETBIOS_NAME eslhnode
BASE_DN dc=example,dc=com
USER_DN ou=people,dc=example,dc=com
GROUP_DN none
NETGROUP_DN ou=netgroup,dc=example,dc=com
USER_OBJECTCLASS inetOrgPerson
GROUP_OBJECTCLASS posixGroup
USER_NAME_ATTRIB cn
USER_ID_ATTRIB uid
KERBEROS_SERVER none
KERBEROS_REALM none
OBJECT access not configured
PARAMETERS VALUES
-------------------------------------------------
2. Issue the mmuserauth service remove command to remove the authentication configuration as shown
in the following example:
# mmuserauth service remove -–data-access-method file
mmcesuserauth service remove: Command successfully completed.
3. Issue the mmuserauth service list command to verify whether the authentication configuration is
removed:
# mmuserauth service list
FILE access not configured
PARAMETERS VALUES
-------------------------------------------------
OBJECT access not configured
PARAMETERS VALUES
-------------------------------------------------
For more information, see mmuserauth command in the IBM Spectrum Scale: Command and Programming
Reference.
Deleting authentication configuration as shown in the previous example does not delete the ID maps. Use
the --idmapdelete option with the mmuserauth service remove command to remove ID maps that are
created for user authentication:
# mmuserauth service remove --data-access-method file --idmapdelete
mmuserauth service remove: Command successfully completed
The deletion of ID maps that are used for file access is only applicable when AD with Automatic ID
mapping or RFC2307 ID mapping is configured.
Deleting ID maps might also be required in the case of object access. ID map delete option can be used if
the system administrator wants to clean up the entire Keystone authentication configuration, including
the mapping of users with projects and roles. Cleaning up of ID mapping information results in loss of
access to any existing data that is being accessed through the Object Storage interface. Deleting ID
mappings deletes user-role-projects mappings as well. Without these mappings, new users are unable to
access the old data unless the keystone administrator creates the mapping again for the new user. ID
maps are deleted in environments where the object protocol needs to be removed or the entire object
store needs to be erased. This is usually done in preproduction or test environments.
If you want to change the authentication method that is already configured for object access, you must
remove the authentication method and ID mappings by issuing the mmuserauth service remove
--data-access-method object and mmuserauth service remove --data-access-method object
--idmapdelete commands in sequence, as shown in the following example:
# mmuserauth service remove --data-access-method object
mmuserauth service remove: Command successfully completed
Note: When you delete the ID maps that are created for file or object access, ensure that all the protocol
nodes are in the healthy state. You can view the health status of protocol nodes by using the mmces state
show -a command.
For more information, see the topic mmuserauth command in the IBM Spectrum Scale: Command and
Programming Reference.
You can check the following authentication details by using the mmuserauth service check command:
v -–data-access-method {file | object | all} Authentication method.
v [-N|--nodes] {node-list | cesNodes} Authentication configuration on each node. If the specified
node is not a protocol node, the check operation gets ignored on that node. If a protocol node is
specified, then the system checks configuration on that protocol node. If you do not specify a node, the
system checks the configuration of only the current node. To check authentication configuration on all
protocol nodes, specify -N cesnodes.
v --server-reachability Verify whether the authentication backend server is reachable. If object is
configured with external Keystone server, this check is not performed.
v [-r | --rectify ] Rectify the configuration for the specified nodes by copying any missing
configuration files or SSL/TLS certificates from another node.
For more information, see the topic mmuserauth command in the IBM Spectrum Scale: Command and
Programming Reference guide.
You can use the id command to see the list of users and groups fetched from the LDAP server. For
example:
# id ldapuser2
uid=1001(ldapuser2) gid=1001(ldapuser2) groups=1001(ldapuser2)
Authentication limitations
Consider the following authentication limitations when you configure and manage the IBM Spectrum
Scale system:
The following limitations exist for Active Directory (AD)-based authentication for object access:
v Only single AD server is used. If the configured AD server is down, the Keystone authentication fails.
v Does not support multiple AD Domains.
v Only Windows 2008 R2 and later are supported.
v Authentication is supported only for read access to the AD server. You cannot create a new user and
modify or delete an existing user from the IBM Spectrum Scale system. Only the AD server
administrator can do these tasks.
AD based authentication
NFS with server-side group lookup and Active Directory authentication is only supported for Kerberized
NFS access. The reason behind this is that obtaining the group membership of a user on a CES node is
only possible after authenticating the user authenticated on that node. With SMB, each new session is
authenticated initially, which is sufficient to provide that information. With NFS, only Kerberized access
can reliably provide the required information when using the Active Directory.
LDAP-based authentication
GUI navigation
To work with this function in the GUI, log on to the IBM Spectrum Scale GUI and select Protocols >
SMB Shares.
Note: An SMB share can only be created when the ACL setting of the underlying file system is -k
nfs4. In all other cases, mmsmb export add will fail with an error.
See “Authorizing protocol users” on page 319 for details and limitations.
GUI navigation
For the documentation of all supported options, see mmsmb command in the IBM Spectrum Scale: Command
and Programming Reference.
To see a list of supported configuration options for SMB shares, run the command:
mmsmb export change --key-info supported
For example, to change the descriptive comment for a share, run the command:
mmsmb export change smbshare --option ’comment=Project X export’
Note: Changes to SMB share configurations only apply to client connections that have been established
after the change has been made.
GUI navigation
To work with this function in the GUI, log on to the IBM Spectrum Scale GUI and select Protocols >
SMB Shares.
For more information, see Managing ACLs of SMB exports using MMC.
Examples:
1. %> mmsmb exportacl list smbexport
[smbexport]
ACL:\Everyone:ALLOWED/FULL
ACL:MYDOM06\Administrator:ALLOWED/FULL
2. %> mmsmb exportacl remove smbexport --user "\Everyone"
[smbexport]
ACL:MYDOM06\Administrator:ALLOWED/FULL
For details, see the information about managing the SMB share ACLs from a Windows client through the
MMC.
GUI navigation
To work with this function in the GUI, log on to the IBM Spectrum Scale GUI and select Protocols >
SMB Shares.
Attention: Listing a large number of entities (thousands of files, connections, locks, etc.) using Microsoft
Management Console (MMC) might take a very long time and it might impact the performance of the file
server. In these cases, it is recommended to use server-side administration tools. In certain cases like
listing a very large number of open files, the MMC might also time-out and show no results if the server
takes too long to collect the corresponding information.
Ensure that the following tasks are complete before you manage SMB shares:
v IBM Spectrum Scale is installed and configured.
v The SMB protocol is enabled and healthy SMB services are running on all protocol nodes.
v Required SMB shares are created and mounted from the Windows client.
v Microsoft Active Directory (AD) based authentication is set up. This includes:
– Cluster nodes and client are domain members.
– The client on which Microsoft Management Console (MMC) is running is a domain member.
– Accurate DNS information is configured. If active sessions are listed, MMC tries to do a reverse
pointer record lookup with DNS for every session (client IP), and if that fails then MMC hangs.
– Involved NetBIOS names can be resolved using DNS.
For using the Shared Folders Microsoft Management Console (MMC) snap-in, you must be a member
of the local administrators group of the cluster. After joining the cluster to an AD domain, only the
domain admins group is a member of the administrators group of the cluster.
To add other users who can use the Shared Folders Microsoft Management Console (MMC) snap-in:
1. Connect to MMC as a user that is a member of the domain admins group.
2. Navigate to System Tools > Local Users and Groups and add a user to the local administrators
group.
For more information, see the Microsoft Management Console documentation.
Note: If there is a permissions related error when you click Shares, verify that you are a member of
the local administrators group of the cluster. For more information, see “Managing SMB shares using
MMC” on page 233.
Note: If there is a permissions related error when you click Shares, verify that you are a member of
the local administrators group of the cluster. For more information, see “Managing SMB shares using
MMC” on page 233.
4. In the Create A Shared Folder wizard, click Next.
5. In the Folder path field, enter the share path and click Next.
Note: The directory for the SMB has to already exist in the file system.
6. Enter the SMB share name and description, select the required offline setting, and then click Next.
7. Select the required SMB share permission setting and click Finish.
Note: If there is a permissions related error when you click Shares, verify that you are a member of
the local administrators group of the cluster. For more information, see “Managing SMB shares using
MMC” on page 233.
4. Do one of the following steps depending on whether you want to modify or remove SMB shares:
v To modify an SMB share:
a. In the right pane, right-click the SMB share that you want to modify, and then click Properties.
b. Modify the properties as required and click OK.
v To remove an SMB share:
a. In the right pane, right-click the SMB share that you want to remove, and then click Stop
Sharing.
Note: If there is a permissions related error when you click Shares, verify that you are a member of
the local administrators group of the cluster. For more information, see “Managing SMB shares using
MMC” on page 233.
4. In the right pane, right-click the SMB share for which you want to view or change the permissions
and then click Properties.
5. You can do one of the following:
v To view the permissions a user or a group has for the SMB share, on the Share Permissions tab,
under the "Group or user names" pane, click on the user name or the group name.
The permissions are displayed in the "Permissions for" pane.
v To change the permissions a user or a group has for the SMB share, on the Security tab, under the
"Group or user names" pane, click on the user name or the group name and then click Edit.
Note: Changes affect only the SMB share, not the ACL in the file system of the exported directory.
For information on permissions that you can change, see documentation for the Shared Folders
Microsoft Management Console (MMC) snap-in.
Note: If there is a permissions related error when you click Shares, verify that you are a member of
the local administrators group of the cluster. For more information, see “Managing SMB shares using
MMC” on page 233.
4. In the right pane, right-click the SMB share whose offline settings you want to modify, and then click
Properties.
5. On the General tab, click Offline Settings.
6. In the Offline Settings window, configure the offline settings of the SMB share. For information on
offline settings that you can configure, see documentation for the Shared Folders Microsoft
Management Console (MMC) snap-in.
GUI navigation
To work with this function in the GUI, log on to the IBM Spectrum Scale GUI and select Protocols > NFS
Exports.
For the documentation of all supported options, see mmnfs command in the IBM Spectrum Scale: Command
and Programming Reference.
For example, to grant another client IP address access to the NFS export, run the following command:
mmnfs export change /gpfs/fs01/nfs --nfsadd "10.23.23.23(Access_Type=RW)"
After the change is made, verify the configuration by running the following command:
mmnfs export list
Note: CES NFS does not restart NFS services if the export is removed dynamically. Refer to the mmnfs
command for more information
| Note: You can use --nfsdefs or --nfsdefs-match as filters with the command. For more information, see
| the topic mmnfs command in the IBM Spectrum Scale: Command and Programming Reference.
To work with the NFS exports function in the GUI, log on to the IBM Spectrum Scale GUI and select
Protocols > NFS Exports.
Since the existing command to change an NFS export, mmnfs export change, will require a few seconds
runtime for every invocation of the command, an alternate method is provided to facilitate bulk changes
to the NFS configuration.This procedure can be used, for example, to quickly and easily add additional
NFS clients to an export, or to change NFS export attributes, on a per client basis, for any existing NFS
client definition.
CAUTION:
The mmnfs export load <NFS_exports_config_file> command causes a server restart similar to a
configuration change. You can use mmnfs export change to avoid a server restart.
If there is at least one existing NFS export, use the following procedure to make changes to an NFS
exports configuration file:
1. Check that there is at least one existing NFS export by issue the following command:
mmnfs export list -Y | grep nfsexports | grep -v HEADER
2. Check that there is at least one CES node in the cluster:
mmces node list -Y | grep -v HEADER
Note: The mmnfs export load command will conduct a check of the exports configuration file. If the
following message is displayed, check the syntax of the NFS exports configuration file, focusing on
the changes made in the previous step and try again:
mmnfs export load. The syntax of the NFS export configuration file to load is not correct:
/tmp/gpfs.ganesha.exports.conf.
11. Verify changes to the NFS configuration via the mmnfs export list command:
mmnfs export list -Y
If a long listing of all NFS exports is desired, use a keyword with the -n option. For example, with
/gpfs as the keyword (/gpfs is the root of each NFS file system in this case):
[11:00:48] xxxxx:~:% mmnfs export list -Y -n /gpfs
mmcesnfslsexport:nfsexports:HEADER:version:reserved:reserved:Path:Delegations:Cl
ients:Access_Type:Protocols:Transports:Squash:Anonymous_uid:Anonymous_gid:SecTyp
e:PrivilegedPort:DefaultDelegations:Manage_Gids:NFS_Commit:
mmcesnfslsexport:nfsexports:0:1:::/gpfs/fs1/fset1:none:10.0.0.1:RO:3,4:TCP:NO_RO
OT_SQUASH:-2:-2:SYS:FALSE:none:FALSE:FALSE:
mmcesnfslsexport:nfsexports:0:1:::/gpfs/fs1/fset1:none:*:RW:3,4:TCP:ROOT_SQUASH:
-2:-2:SYS:FALSE:none:FALSE:FALSE:
Multiprotocol exports
Exports for SMB and NFS protocols can be configured so that they have access to the same data in the
GPFS file system.
To export data via NFS and SMB , first create an export for one protocol using the appropriate GPFS
command (for example, mmnfs export add). In order to export the same GPFS path via a second
protocol, simply create another export using the protocol-specific export management command (for
example, mmsmb export add).
The operations of adding and removing exports do not delete any data in the GPFS file system, and
removal of exports does not change the data in the GPFS file system. If at a later time access to a GPFS
file system for a specific protocol needs to be removed, this can be done via the corresponding command.
It also does not impact access to the same data configured via another protocol.
For more information, see the Unified file and object access overview topic in the IBM Spectrum Scale:
Concepts, Planning, and Installation Guide.
These restrictions apply to the general areas of file locking (including share reservation and lock
semantics), recovery (reclaim), and cross-protocol notifications.
Access Control Lists (ACLs): In IBM Spectrum Scale, there is a single common ACL per file or directory
in the cluster file system that is used for POSIX, NFS, and SMB access. The SMB server converts each
Windows ACL into an NFS4 ACLs for the corresponding file system object.
Shared access (share modes, share reservations): Share modes are feature of the SMB protocol that
allows clients to announce what type of parallel access should be allowed by other clients while the file is
open. NFSv4 share reservations are the equivalent of SMB share modes for the NFS protocol. There is no
equivalent in NFSv3 but IBM Spectrum Scale allows the SMB server to propagate the share modes into
the cluster file system so that NFS clients can honor share modes on commonly used files. The
corresponding SMB option is gpfs:sharemodes.NFSv4 share reservations are currently not supported.
Note that disabling the SMB option gpfs:sharemodes can result in data integrity issues, as SMB
application can rely on the enforcement of exclusive access to data to protect the integrity of a file's data.
As the SMB file server also does sharemode checks internally, gpfs:sharemodes can safely be disabled for
data that is only accessed through the SMB protocol.
The important point for POSIX and NFS applications is that file system sharemodes can result in the
open() or unlink() system calls to return EACCES. Applications must be prepared to handle this situation.
Details for the interaction of SMB with the file system sharemodes:
If another application already has the file open for reading and writing, and the specified share mode
from the SMB client conflicts with the existing open, then the SMB client cannot open the file and a
"sharing violation" error is returned back to the SMB client.
Share modes in the file system are only enforced if the file is actually opened for READ, EXECUTE,
WRITE or APPEND access from the SMB client.
Another limitation of the share mode enforcement in the file system is that it is not possible to grant
parallel FILE_SHARE_READ and FILE_SHARE_WRITE access, while not granting FILE_SHARE_DELETE
access. In this case, the file system does not enforce that the FILE_SHARE_DELETE restriction and the file
can still be deleted.
These limitations only apply to enforcement of sharemodes in the file system. The SMB server also
performs internal sharemode checks and handles the sharemode correctly for all SMB access.
Note: The CES NFS server keeps the files accessed by NFSv3 open for a while for performance reasons.
This might lead to conflicts during concurrent SMB access to these files. You can use the following
command to find out whether the NFS server holds the specified file open:
ls /proc/$(pidof gpfs.ganesha.nfsd)/fd -l | grep <file-name>
Opportunistic Locking: Oplocks are a feature of the SMB protocol that allows clients to cache files locally
on the client. If the SMB server is set to propagate oplocks into the cluster file system (gpfs:leases), other
clients (NFS, POSIX) can break SMB oplocks. NFS4 delegations are currently not supported.
Byte-range locks: Byte-range locks from SMB clients are propagated into the cluster file system if the
SMB option "posix locking" is true. In that case, POSIX and NFS clients are made aware of those locks.
Note that for Windows byte-range locks are mandatory whereas for POSIX they are advisory.
File change notifications: SMB clients can request notifications when objects change in the file system.
The SMB server notifies its clients about the changes. The notifications include changes that are triggered
by POSIX and NFS clients in the directory for which notifications are requested, but not in its
subdirectories, if they are done on any CES node. File changes initiated on non-CES cluster nodes will
not trigger a notification.
Grace period: The grace period allows NFS clients to reclaim their locks and state for a certain amount of
time after a server failure. SMB clients are not aware of NFS grace periods. If you expect a lot of
contention between SMB and NFS, NFSv4 reclaims might fail.
Multiprotocol access of protocol exports is only allowed between NFSV4 and SMB. That is, you cannot
access the same export by using both NFSV3 and SMB protocols. The reason is that SMB clients typically
request exclusive access to a file which does not work with the CES NFS server that keeps files accessed
through NFSv3 open.
IBM Spectrum Scale uses the mmces service command to enable, start, stop, or disable Object services on
all protocol nodes.
The enable and disable operations are cluster-wide operations. To enable or disable the Object protocol,
use mmces service [enable | disable] OBJ. The Object protocol must have been initially configured
using the mmobj swift base command before it can be enabled in the cluster.
CAUTION:
Disabling the object service unconfigures the Object protocol and discards OpenStack Swift
configuration and ring files from the CES cluster. If Openstack Keystone configuration is configured
locally, disabling object storage also discards the Keystone configuration and database files from the
CES cluster. However, to avoid accidental data loss, the associated filesets used for the object data are
not automatically removed during disable. The filesets for the object data and any filesets created for
optional object storage policies need to be removed manually. If you plan to re-enable the object
protocol after disabling it, it is recommended that you access the repository during object disablement
in order to reset to the original default object configuration. For enabling the object service
subsequently, either different fileset names need to be specified or the existing filesets need to be
cleaned up. For information on cleaning up the object filesets, see the steps "Remove the fileset created
for object" and "Remove any fileset created for an object storage policy"(if applicable) in the Cleanup procedures
required if reinstalling with the spectrumscale installation toolkit topic of IBM Spectrum Scale: Concepts,
Planning, and Installation Guide.
Note: To disable the object protocol, first remove the object authentication. For complete usage
information, see the mmuserauth command in the IBM Spectrum Scale: Command and Programming Reference.
In addition, enabled Object service can be started and stopped on individual nodes or cluster-wide.
To start or stop the Object protocol cluster-wide. use -a flag mmces service [start | stop] OBJ -a.
To start or stop the Object protocol on individual nodes, use mmces service [start | stop] OBJ -N
<node>.
Attention: If object services on a protocol node are stopped by the administrator manually, access to
object data might be impacted unless the CES IP addresses are first moved to another node. There are
multiple ways to accomplish this, but the simplest is to suspend the node. After suspending a node, CES
automatically moves the CES IPs to the remaining nodes in the cluster. However, doing this suspends
operation for all protocols running on that protocol node.
If you want to stop object services on a protocol node, you can use the following steps:
1. Suspend CES operations on the protocol node using the mmces node suspend command.
2. View the CES IP addresses on that node using the mmces address list command and verify that all
CES IP addresses have been moved to other protocol nodes.
3. Stop the object services using the mmces service stop OBJ command.
To restore object services on that protocol node, you can use the following steps:
1. Resume CES operations on the protocol node using the mmces node resume command.
2. View the CES IP addresses on that node using the mmces address list command and verify that all
CES IP addresses have been moved to that protocol node.
3. Start the object services using the mmces service start OBJ command.
Use the mmces service list command to list the protocols enabled on IBM Spectrum Scale. List a verbose
output of object services running on the local node using the -v flag as shown in the following example:
# mmces service list -v
Enabled services: OBJ SMB NFS
OBJ is running
OBJ:openstack-swift-object-updater is running
OBJ:openstack-swift-object-expirer is running
OBJ:ibmobjectizer is running
OBJ:openstack-swift-object-auditor is running
OBJ:openstack-swift-object is running
OBJ:openstack-swift-account is running
OBJ:openstack-swift-container is running
OBJ:memcached is running
OBJ:openstack-swift-proxy is running
OBJ:openstack-swift-object-replicator is running
OBJ:openstack-swift-account-reaper is running
OBJ:openstack-swift-account-auditor is running
OBJ:openstack-swift-container-auditor is running
OBJ:openstack-swift-container-updater is running
OBJ:openstack-swift-account-replicator is running
OBJ:openstack-swift-container-replicator is running
OBJ:openstack-swift-object-sof is running
OBJ:postgresql-obj is running
OBJ:httpd (keystone) is running
SMB is running
NFS is running
For complete usage information, see mmces command in IBM Spectrum Scale: Command and Programming
Reference.
Every object protocol node can access every virtual device in the shared file system, and some OpenStack
Swift object services can be optimized to take advantage of this by running from a single Object protocol
node.
Even though objects are not replicated by OpenStack Swift, the swift-object-replicator runs to
periodically clean up tombstone files from deleted objects. It is run on a single Object protocol node and
manages cleanup for all of the virtual devices.
The swift-object-updater is responsible for updating container listings with objects that were not
successfully added to the container when they were initially created, updated, or deleted. Like the object
replicator, it is run on a single object protocol node.
The following table shows each of the object services and the set of object protocol nodes on which they
need to be executed.
1
If unified file and object access is enabled.
2
If multi-region object deployment is enabled.
3
If local OpenStack Keystone Identity Service is configured.
4
Updated to httpd (keystone) on all nodes, if using local authentication.
In IBM Spectrum Scale, for Object storage, several OpenStack commands have been replaced with IBM
Spectrum Scale commands for easy maintenance. This section identifies those commands.
1. Ring Building
The swift-ring-builder command should only be used to view the object, container, and account ring
on any IBM Spectrum Scale protocol node. The user should not directly execute any commands that
modify the ring. All ring maintenance operations are handled automatically by the CES infrastructure.
For example, when a new CES IP address is added to the configuration, all rings are automatically
updated to distribute Swift virtual devices evenly across CES IP addresses.
The master copy of each ring builder file is kept in the IBM Spectrum Scale Cluster Configuration
Repository (CCR). Changes made locally to the ring files will be overwritten with the master copy
when monitoring detects a difference between the ring file in CCR and the file in /etc/swift.
2. Configuration Changes
You can manage the Object configuration data in the Cluster Configuration Repository (CCR). When an
Object configuration is changed, callbacks on each protocol node updates that node with the change and
restart one or more Object services if necessary.
To change the Object configuration, use the mmobj command so that the change is made in the CCR and
propagated correctly across the Swift cluster.
For more details, see the mmobj command in the IBM Spectrum Scale: Command and Programming Reference.
Perform the following steps if S3 API was not enabled as part of the object base configuration:
1. To enable S3 API, run the following command:
mmobj s3 enable
The system enables S3 API.
2. To verify that S3 API is enabled, run the following command:
mmobj s3 list
3. To disable S3 API, run the following command:
mmobj s3 disable
The system disables S3 API.
4. To verify that S3 API is disabled, run the following command:
mmobj s3 list
Remember: You can use the Swift3 Middleware for OpenStack Swift with S3 clients that are using the V2
or V4 S3 protocol.
The V2 protocol is the default. If you use the V4 protocol, make sure that the region of the request
matches the value of the location property in the filter:swift3 section of proxy-server.conf. The
default value for location in the Swift3 Middleware is US, which means that V4 S3 clients must set US as
the region. You can change the location value to something other than US by changing the property in the
proxy-server.conf file. Change the location by running the following command:
mmobj config change --ccrfile "proxy-server.conf" --section "filter:swift3" --property "location" --value "NEW_LOCATION"
Replace "NEW_LOCATION" with the appropriate value for your environment. Once you change the value,
any S3 clients that are using the V4 protocol must set their region to the same value.
The credentials are created by the OpenStackClient, a command-line client for OpenStack, that allows the
creation and use of access/secret pairs for a user/project pair. This requires the operators to create the
access/secret for each user/project pair.
1. Source openrc with the admin credentials.
2. Create EC2 credential by running this command for user-defined blob as a credential:
openstack credential create --type ec2 --project <project> <user> ’{"access": "<aws_access_key>", "secret": "<aws_secret_key>"}’
Note: Ensure to use Keystone UUIDs rather than names if duplicate user/project names exist across
domains. Additionally, the admin users should be able to list and delete access/secrets for a specific
user/project.
3. View all EC2 credentials by running this command:
openstack credential list
openstack credential show <credential-id>
4. You can change your Access Key ID and Secret Access Key if necessary.
It is recommended to have regular rotation of these keys and switching applications to use the new
pair.
Change the EC2 credentials by running this command:
openstack credential set –type ec2 –data ’{“access”: <access>, “secret”: <secret>}’ --project <project> <credential-id>
You are now ready to connect to the IBM Spectrum Scale Object store using the Amazon S3 API. You
can connect with any S3-enabled client.
If S3 API is enabled, the default value of s3_acl in the proxy-server.conf file is true. S3 API uses its
own metadata for ACL, such as X-Container-Sysmeta-Swift3-Acl, to achieve the best S3 compatibility.
However, if S3 API is set to false, S3 API tries Swift ACLs, such as X-Container-Read, initially instead of
S3 ACLs.
For a user to use S3 API in IBM Spectrum Scale, the user must have a role defined for the swift project.
Any role suffices, because for S3 API there is no difference between the SwiftOperator role or others.
The owner of a resource is implicitly granted FULL_CONTROL instead of just READ_ACP and WRITE_ACP. This
is not a security issue because with WRITE_ACP, the owners can grant themselves FULL_CONTROL access.
For an overview of object capabilities, see Object capabilities in IBM Spectrum Scale: Concepts, Planning, and
Installation Guide.
v To list all object capabilities available cluster wide, use the mmobj config list command as follows:
mmobj config list --ccrfile spectrum-scale-object.conf --section capabilities
The system displays output similar to the following:
file-access-enabled: true
multi-region-enabled: true
s3-enabled: false
You can also list specific object capabilities using the mmobj config list command as follows:
mmobj config list --ccrfile spectrum-scale-object.conf --section capabilities
--property file-access-enabled
mmobj config list --ccrfile spectrum-scale-object.conf --section capabilities
--property multi-region-enabled
mmobj config list --ccrfile spectrum-scale-object.conf --section capabilities
--property s3-enabled
v Use the mmobj command to enable object capabilities:
– To enable or disable the file-access object capability, use the mmobj file-access enable or the mmobj
file-access disable command.
– To enable or disable the multiregion object capability, use the mmobj multiregion enable or the
mmobj multiregion disable command.
– To enable or disable the s3 object capability, use the mmobj s3 enable or the mmobj s3 disable
command.
If the section is already present, the command displays text like the following:
[filter:versioned_writes]
use = egg:swift#versioned_writes
b. If the section is not present, issue the following command to add it:
mmobj config change --ccrfile proxy-server.conf --section "filter:versioned_writes" --property "use" --value "egg:swift#versioned_writes"
The command displays the pipeline module list as in the following example:
pipeline = healthcheck cache . . . slo dlo versioned_writes proxy-server
The preceding command is an example. When you issue the command on your system, follow
these steps:
1) In the --value parameter, specify the actual list of pipeline modules that were displayed on
your system in the output of Step 3(a). Be sure to enclose the list in double quotation marks.
2) In the list of pipeline modules that you just specified, insert the versioned_writes pipeline
module immediately before the proxy-server module, as is shown in the preceding example.
To disable object versioning across the cluster, run the following command:
# mmobj config change --ccrfile proxy-server.conf --section
’filter:versioned_writes’ --property allow_versioned_writes --value false
Note: The container1 container contains only the latest version of the objects. The older versions of
object are stored in version_container.
9. Run swift list on version_container to see the older versions of the object:
swift list version_container
00aImageA.jpg/1468227497.47123
00aImageA.jpg/1468227509.48065
10. To delete the latest version of the object, perform the DELETE operation on the object:
swift delete container1 ImageA.jpg
ImageA.jpg
(deleted latest/third version)
After a storage policy is created, you can specify that storage policy while creating new containers to
associate that storage policy with those containers. When objects are uploaded into a container, they are
stored in the fileset that is associated with the container’s storage policy. For every new storage policy, a
new object ring is created. The ring defines where objects are located and also defines multi-region
replication settings.
The name of the fileset can be specified optionally as an argument of the mmobj policy create command.
An existing fileset can be used only if it is not being used for an existing storage policy.
If even one of these prerequisites is missing, the mmobj policy create command fails. Otherwise, the
fileset is used and the softlinks for the devices that are given to the ring builder point to it. If no fileset
name is specified with the mmobj policy create command, a fileset is created using the policy name as a
part of the fileset name with the prefix obj_.
For example, if a storage policy with name Test is created and no fileset is specified, a fileset with the
name obj_Test is created and is linked to the base file system for object:
<object base filesystem mount point>/obj_Test/<n virt. Devices>
Attention: For any fileset that is created, its junction path is linked under the file system. The junction
path should not be changed for a fileset that is used for a storage policy. If it is changed, data might be
lost or it might get corrupted.
To enable swift to work with the fileset, softlinks under the given devices path in object-server.conf are
created:
<devices path in object-server.conf>/<n virt. Devices>
<object base filesystem mount point>/obj_Test/<n virt. Devices>
Before creating a storage policy with the file-access (unified file and object access) function enabled, the
file-access object capability must be enabled. For more information, see “Managing object capabilities” on
page 251 and Object capabilities in IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
v To create a new storage policy with the unified file and object access feature enabled, run the following
command:
mmobj policy create sof-policy --enable-file-access
The system displays output similar to this:
[I] Getting latest configuration from ccr
[I] Creating fileset /dev/gpfs0:obj_sof-policy
[I] Creating new unique index and build the object rings
[I] Updating the configuration
[I] Uploading the changed configuration
v To list storage policies for object storage with details of functions available with those storage policies,
run the following command:
mmobj policy list --verbose
The system displays output similar to the following example:
Index Name Deprecated Fileset Fileset Path Functions Function Details File System
----------------------------------------------------------------------------------------------------------------------------------------
0 SwiftDefault object_fileset /ibm/cesSharedRoot/object_fileset cesSharedRoot
11751509160 sof-policy obj_sof-policy /ibm/cesSharedRoot/obj_sof-policy file-and-object-access regions="1" cesSharedRoot
11751509230 mysofpolicy obj_mysofpolicy /ibm/cesSharedRoot/obj_mysofpolicy file-and-object-access regions="1" cesSharedRoot
v To change a storage policy for object storage, run the following command:
mmobj policy change
v To change the default storage policy, run the following command:
mmobj policy change sof-policy --default
For more information about the mmobj policy command, see mmobj command in the IBM Spectrum Scale:
Command and Programming Reference.
where
MM = 0-59 minutes
HH = 0-23 hours
dd = 1-31 day of month
ww = 0-7 (0=Sun, 7=Sun) day of week
– Use * for specifying every instance of a unit. For example, dd = * means that the job is scheduled to
run every day.
– Comma separated lists are allowed. For example, dd = 1,3,5 means that the job is scheduled to run
on every 1st, 3rd, 5th of a month.
Note: The download performance of objects in a compressed container is reduced compared to the
download performance of objects in a non-compressed container.
Note: The compression function enables the file system compression over the object file set. The same
compression functionality and restrictions apply to object compression and file compression.
Related concepts:
File compression
You can compress or decompress files either with the mmchattr command or with the mmapplypolicy
command with a MIGRATE rule. You can do the compression or decompression synchronously or defer it
until a later call to mmrestripefile or mmrestripefs.
To create a storage policy with the encryption function enabled, use the mmobj policy create command
as follows:
mmobj policy create PolicyName -f FilesetName -i MaxNumInodes
--enable-encryption --encryption-keyfile EncryptionKeyFileName
-–force-rule-append
where:
PolicyName
The name of the storage policy to be created.
FilesetName
The fileset name that the created storage policy must use. Optional.
FilesystemName
The file system name where the fileset resides. Optional.
MaxNumInodes
The inode limit for the new inode space. Optional.
--enable-encryption
Enables an encryption policy.
EncryptionKeyFileName
The fully qualified path of the encryption key file.
--force-rule-append
Adds and establishes the rule if other rules already exist. Optional.
Note: The encryption function enables the file system encryption over the object file set. The same
encryption functionality and restrictions apply to object encryption and file encryption.
In the command examples, Europe is the first region and Asia is the second region.
1. Export the information of the first region to a file by using the mmobj multiregion export command.
For example:
[europe]# mmobj multiregion export --region-file /tmp/multiregion_europe.dat
2. Copy the file manually to the second region.
For example:
[europe]# scp /tmp/multiregion_europe.dat asia:/tmp
3. From the second region, join the multi-region environment as follows:
a. Use the file generated in the first region while deploying object on the second region by using
the mmobj swift base command.
For example:
[asia]# mmobj swift base -g /mnt/gpfs0 --cluster-hostname gpfs-asia --admin-password xxxx
-i 100000 --admin-user admin \
--enable-multi-region --remote-keystone-url http://gpfs-asia:35357/v3
--join-region-file /tmp/multiregion_europe.dat \
--region-number 2 --configure-remote-keystone
This step installs the object protocol in the 2nd region and joins the 1st region. Additional devices
are added to the primary ring files for this region.
4. Export the ring file data of the second region.
For example:
[asia]# mmobj multiregion export --region-file /tmp/multiregion_asia.dat
5. Copy the file manually to the first region.
For example:
[asia]# scp /tmp/multiregion_asia.dat europe:/tmp
Note:
Now the two clusters have been synced together and can be used as a multi-region cluster. Objects can be
uploaded and downloaded from either region. If the installation of the second region specified the
--configure-remote-keystone flag, a region-specific endpoint for the object-store service for the 2nd
region is created in Keystone.
The regions need to be synced in the future any time region-related information changes. This includes
changes in the set of CES IP addresses (added or removed) or if storage policies were created or deleted
within a region. Changes that affect the swift.conf file or ring files need to be synced to all regions. For
example, adding additional CES addresses to a region causes the ring files to be rebuilt.
7. In the second region, add CES addresses and update other clusters.
For example:
[asia]# mmces address add --ces-ip asia9
This steps adds an address to the CES IP pool. This also triggers a ring rebuild which changes the
IP-to-device mapping in the ring files.
8. Export the ring data so the other clusters in the region can be updated with the new IPs from the
second region.
For example:
[asia]# mmobj multiregion export --region-file /tmp/multiregion_asia.dat
9. Copy the file manually to the first region.
For example:
[asia]# scp /tmp/multiregion_asia.dat europe:/tmp
10. In the first region, update with changes for the new second region address in the ring.
For example:
[europe]# mmobj multiregion import --region-file /tmp/multiregion_asia.dat
This step imports the changes from the second region. When this is complete, a checksum is
displayed which can be used to determine when regions are synchronized together. By comparing it
to the one printed when the region data was exported, you can determine that the regions are
synchronized when they match. In some cases, the checksums do not match after import. This is
typically due to some local configuration changes on this cluster which are not yet synced to the
other regions. If the checksums do not match, then this region's configuration needs to be exported
and imported into the other region to sync them.
A multi-region environment consists of several independent storage clusters linked together to provide
unified object access. Configuration changes in one cluster which affect the multi-region environment are
not automatically distributed to all clusters. The cluster which made the configuration change must
export the relevant multi-region data and then the other regions must import that data to sync the
multi-region configuration. Changes which affect multi-region are:
v Changes to the CES IP pool, such as adding or deleting addresses, which affect the ring layout.
Use the following commands to manage the configuration of the multi-region environment:
v To export the data for the current region so that it can be integrated into other regions, use the
following command. The RegionData file created can be used to update other regions:
mmobj multiregion export --region-file RegionData
The RegionData file is created and it contains the updated multi-region information.
v To import the multi-region data to sync the configuration, use the following command. The RegionData
must be the file created from the mmobj multiregion export command:
mmobj multiregion import --region-file RegionData
As part of the export/import commands, a region checksum is printed. This checksum can be used to
ensure that the regions are in sync. If the checksums match, then the multi-region configuration of the
clusters match. In some cases, the checksums do not match after import. This is because the cluster
performing the import had local configuration changes which had not been synced with the other
regions. For example, a storage policy was created but the multi-region configuration was not synced
with the other regions. When this happens the import command prints a message that the regions are not
fully in sync because of the local configuration and that the region data must be exported and imported
to the other regions. Once all regions have matching checksums, the multi-region environment is in sync.
An existing region can be completely removed from the multi-region environment. This action
permanently removes the region configuration, and the associated cluster cannot rejoin the multi-region
environment.
The cluster of the removed region needs to disable object services since it will not be usable as a
standalone object deployment.
v To remove a previously defined region from the configuration, use the following command:
mmobj multiregion remove --remove-region-number RegionNumber
The remove command must be run from a different region than the one being removed. The cluster
associated with the removed region must cleanup object services as appropriate with the mmces
service disable OBJ -a command to uninstall object services.
v You can display the current multi-region information using the following command:
mmobj multiregion list
Important: In a unified file and object access environment, object ACLs apply only to accesses through
the object interface and file ACLs apply only to accesses through the file interface.
For example: If user Bob ingests a file from the SMB interface and user Alice does not have access to that
file from the SMB interface, it does not mean that Alice does not have access to the file from the object
interface. The access rights of Alice for that file or object from the object interface depends on the ACL
defined for Alice on the container in which that file or object resides.
Note: Before you enable object access for the existing filesets, ensure that SELinux is in the Permissive or
Disabled mode.
You can do the following operations. All the commands are on one line:
1. The following command enables object access to the fileset and updates the container listing with the
existing files:
mmobj file-access link-fileset --sourcefset-path /gpfs1/legacy_fset1
--account-name admin --container-name cont1 --fileaccess-policy-name sof_policy
--update-listing
The command also creates the soft link gpfs1-legacy_fset1. The link name is constructed according
to the following format: <file_system_name>-<fileset_name>.
2. Both of the following commands upload an object newobj to the linked fileset path
/gpfs1/legacy_fset1. Both commands use the soft link gpfs1-legacy_fset1 that is created in the
preceding example. You can use either method:
v The following example uses the swift utility:
swift upload -H "X-Storage-Policy: sof_policy" cont1 ’gpfs1-legacy_fset1/newobj’
v The following example uses the curl utility:
curl -X PUT -T newobj -H "X-Storage-Policy: sof_policy"
http://specscaleswift.example.com:8080/v1/AUTH_cd1a29013b6842939a959dbda95835df/cont1/gpfs1-legacy_fset1/newobj
The following command creates a new directory newdir and uploads the object newobj1 to it:
swift upload -H "X-Storage-Policy: sof_policy" cont1 ’gpfs1-legacy_fset1/newdir/newobj1’
3. The following command lists the contents of the container cont1:
swift list cont1
You can do the following operations. All the commands are on one line:
1. The following command enables object access to the fileset and creates the link gpfs1-legacy_fset2
but does not update the container listing with existing files:
mmobj file-access link-fileset --sourcefset-path /gpfs1/legacy_fset2
--account-name admin --container-name cont2 --fileaccess-policy-name sof_policy
2. The following command uploads the object newobj to the linked fileset path /gpfs1/legacy_fset2:
swift upload -H "X-Storage-Policy: sof_policy" cont2 ’gpfs1-legacy_fset2/newobj’
3. The following command lists the contents of the container cont2:
swift list cont2
Note that the command displays only the objects that are added to the fileset, either by uploading the
object or by specifying the --update-listing parameter with mmobj --file-access. Here the only such
object is newobj. The command does not list the existing file existingfile2.
Enabling access with a fileset path from a different object file system
You can enable object access to an existing non-object fileset path where the fileset path is derived from a
different object file system. To do so, omit the --update-listing parameter. You can access the data with
the utilities swift or curl. However, the container listing is not updated with the existing file entries and
object metadata is not appended to the existing data.
Unified file and object access comprises the following two modes:
v local_mode: Separate identity between object and file (Default mode)
v unified_mode: Shared identity between object and file
The mode is represented by the id_mgmt configuration parameter in the object-server-sof.conf file:
You can change this parameter by using the mmobj config change command. For more information, see
“Configuring authentication and setting identity management modes for unified file and object access” on
page 270.
File system
bl1adm102
AD/LDAP/Local LDAP, it can be the same for file and object access
or it can be different.
Note: If your deployment uses only SMB-based file interface for file access and file authentication is
configured with Active Directory (AD) with Automatic ID mapping, unified file and object access can
be used, assuming that object is configured with the same AD domain.
v Ownership: Object created from the object interface is owned by the user doing the object PUT
operation.
v If the object already exists, existing ownership of the corresponding file is retained if retain_owner is
set to yes in object-server-sof.conf. For more information, see “Configuration files for IBM Spectrum
Scale for object storage” on page 280.
v Authorization: Object access follows the object ACL semantics and file access follows the file ACL
semantics.
v Retaining ACL, extended attributes (xattrs), and Windows attributes (winattrs): If the object is created
or updated over existing file then existing file ACL, xattrs, and winattrs are retained if retain_acl,
retain_xattr, and retain_winattr are set to yes in object-server-sof.conf. For more information, see
“Configuration files for IBM Spectrum Scale for object storage” on page 280.
v When a user does a PUT operation for an object over an existing object or does a PUT operation for a
fresh object over a nested directory, no explicit file ACL is set for that user. This means that it is
possible that in some cases, the user might not have access to that file from the file interface even
though the user has access from the object interface. This is done to prevent changing of the file ACL
from the object interface to maintain file ACL semantics. In these cases, if the user is required to have
permission to access the file also, explicit file ACL permission need to be set from the file interface.
For example: If user Bob performs a PUT operation for an object over an existing object (object maps to
a file) owned by user Alice, Alice continues to own the file and there is no explicit file level ACL that is
set for Bob for that file. Similarly, when Bob performs a PUT operation for a new object inside a
subdirectory (already created by Alice), no explicit file ACL is set on the directory hierarchy for Bob.
Note: **Unified file and object access retains the extended attributes (xattr), Windows attributes (winattrs)
and ACL of the file if there is a PUT request from an object over an existing file. However, security or
system namespace of extended attributes and other IBM Spectrum Scale extended attributes such as
immutability, pcache, etc. are not retained. Swift metadata (user.swift.metadata) is also not retained and
it is replaced according to object semantics which is the expected behavior.
IBM Spectrum Scale offers various features that leverage user identity (UIDs or GIDs). With
unified_mode, you can use these features seamlessly across file and object interfaces.
v Unified access to object data: User can access object data using SMB or NFS exports using their AD or
LDAP credentials.
v Quota: GPFS quota for users that work on UID or GID can be set such that they work for the file as
well as object interface.
Example: User A can have X quota on a unified access fileset assigned using GPFS quota commands
which can hold true for all data created by the user from the file or the object interface.
For more information, see the Quota related considerations for unified_mode section.
File system
bl1adm103
AD/LDAP
The fileset quotas and container level quotas as well as fileset quotas and account quotas are independent
of each other.
In some cases, the fileset quota should be cumulative of all the containers' quotas hosted over it, though
it is not mandatory. When both the quotas at the fileset level as well as at the container quota level are
set, and if the fileset quota is reached, no more object data can be input on any of the containers hosted
The objectization process does not take into account the container quota and the account quota. This
means that there might be a scenario where a container can host more data than the container quota
associated with it especially when the ibmobjectizer service has objectized files as objects.
In this example scenario, note that the object access will be restricted if either the user quota or the fileset
quota is reached, even though the container quota is not reached.
In this mode, all the objects created continue to be owned by the swift user, that is an administrator
under whose context the object server runs on the system. Because in this mode there is no ID mapping
of objects to user ID, object authentication can be configured to any supported authentication schemes
and file authentication can continue to be configured to any supported authentication scheme.
For supported authentication schemes, see the Authentication support matrix table in the Authentication
considerations topic in IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
This mode allows objects and files to be owned by the users' UID and the corresponding GID that
created them. This mode mandatorily requires both the object protocol and the file protocol to be
configured with the same authentication scheme. The supported authentication schemes for the unified
mode are:
v AD for Authentication + RFC 2307 for ID mapping
v LDAP for authentication as well as for ID mapping
Note: User-defined authentication is not supported with both the identity management modes.
With AD configuration, file authentication must be configured with Unix mapped domain. And the
object authentication should also be configured with the same AD domain. This AD domain should
be updated in the object-server-sof.conf configuration as: ad_domain = <AD domain name>
3. Configure the file authentication and the object authentication against the same server:
FILE : SERVERS 9.118.37.234
OBJECT : SERVERS 9.118.37.234
Note: If there are multiple domain controllers in AD, the values might not match. The administrator
must ensure that the server is referring to same user authentication source.
4. Ensure that the object users are receiving the correct UIDs and GIDs from the authentication source.
The following example uses userr as the object user:
cat openrc
export OS_AUTH_URL="http://127.0.0.1:35357/v3"
export OS_IDENTITY_API_VERSION=3
export OS_AUTH_VERSION=3
export OS_USERNAME="userr"
When new files are added from the file interface, they need to be visible to the Swift database to show
correct container listing and container or account statistics.
The ibmobjectizer service ensures synchronization between the file metadata and the object metadata at
predefined time interval that ensures accurate container and account listing. The ibmobjectizer service
identifies new files added from the file interface and adds the Swift system metadata to them so that they
are objectized. The ibmobjectizer service then determines its container and account databases and adds a
new object entry to those. It also identifies files deleted from file interface and deletes their corresponding
entries from container and account databases.
This is particularly useful in setups where data is ingested using legacy file interface based devices such
as medical and scientific devices and it needs to be stored and accessed over cloud using the object
interface.
The ibmobjectizer service is a singleton and it is started when object is enabled and the file-access object
capability is set. However, the ibmobjectizer service starts objectization only when there are containers
with unified file and object access storage policies configured and the file-access object capability is set.
For use cases in which objects are always ingested using the object interface and the file interface is used
only for reading them, the ibmobjectizer service is not needed and can be disabled using the mmobj
file-access command. For more information, see “Starting and stopping the ibmobjectizer service” on
page 269.
To identify the node on which the ibmobjectizer service is running, use the mmces service list
--verbose command.
For example, if you have a cluster that has gpfsnode3 as the object singleton node, run the following
command:
mmces service list --verbose -a | grep ibmobjectizer
Attention: If object services on the singleton node are stopped by the administrator manually,
objectization is stopped across the cluster. Therefore, manually stopping services on a singleton node
must be planned carefully after understanding its impact.
For information on limitations on the objectizer process, see “Limitations of unified file and object access”
on page 277.
Related concepts:
Unified file and object access stores objects following the same path hierarchy as the object's URL. In
contrast, the default object implementation stores the object following the mapping given by the ring, and
its final file path cannot be determined by the user easily. For example, an object with the following URL
is stored by the two systems as follows:
v Example object URL: https://swift.example.com/v1/acct/cont/obj
v Path in default object implementation: /ibm/gpfs0/object_fileset/o/z1device108/objects/7551/125/
75fc66179f12dc513580a239e92c3125/75fc66179f12dc513580a239e92c3125.data
v Path in unified file and object access: /ibm/gpfs0/obj_sofpolicy1/s69931509221z1device1/
AUTH_763476384728498323747/cont/obj
In this example, it is assumed that the object is configured over the /ibm/gpfs0 file system with the
default object located on the object_fileset fileset and the unified file and object access data is located
under the obj_sofpolicy1 fileset. s69931509221z1device1 is auto-generated based on the swift ring
parameters and AUTH_763476384728498323747 is auto-generated based on the account ID from keystone.
Attention: Do not unlink object filesets including the unified file and object access enabled filesets.
Determining the POSIX path of a unified file and object access enabled fileset
Use the following steps for determining the POSIX path of a unified file and object access enabled fileset.
1. List all storage policies for object.
mmobj policy list
The swift ring builder creates a single virtual device for unified file and object access policies, named
with storage policy index number, which is also the region number, starting with s and appended with
z1device1.
s13031510160z1device1
3. List the swift projects and identify the one you are interested in working with.
The full path to the unified file and object access containers is the concatenation of the fileset linkage, the
virtual device name, and the account name:
/fileset linked path/s<region_number>z1device1/AUTH_account id/
Note: To substitute the correct project prefix, see Managing object users, roles, and projects in IBM Spectrum
Scale: Administration Guide.
5. List the containers defined for this account.
ls /ibm/fs0/obj_sof-policy1/s13031510160z1device1/AUTH_73282e8bca894819a3cf19017848ce6b/
new1 fifthcontainer RTC73189_1 RTC73189_3 RTC73189_5 RTC73189_7 sixthcontainer
The ibmobjectizer service can be started and stopped by running the following commands.
Note: Objectization is a resource intensive process. The resource utilization is related to the number of
containers that have unified file and object access enabled. The schedule of running the objectization
process must be planned carefully. Running it too frequently might impact your protocol node's resource
utilization. It is recommended to either schedule it during off business hours, especially if you have a
small number of protocol nodes (say 2) with basic resource configuration, or schedule with an interval of
30 minutes or more if you have protocol nodes with adequate resources (where the number of protocol
nodes > 2). It is recommended to set the objectizer service interval to a minimum of 30 minutes or
more irrespective of your setup. If you need to urgently objectize files then you can use the mmobj
file-access command that allows you to immediately objectize the specified files.
v Set up the objectization interval using the mmobj config change as follows.
mmobj config change --ccrfile spectrum-scale-objectizer.conf \
--section DEFAULT --property objectization_interval --value 2400
This command sets an interval of 40 minutes between the completion of an objectization cycle and the
start of the next cycle.
v Verify that the objectization time interval is changed using the mmobj config list as follows.
mmobj config list --ccrfile spectrum-scale-objectizer.conf --section DEFAULT
--property objectization_interval
The periodic scans run by the ibmobjectizer service are resource intensive and might affect the object IO
performance. Quality Of Service (QOS) can be set on the ibmobjectizer service depending upon the IO
workload and the priority at which the ibmobjectizer service must be run. The usage of resources is
limited to the given number so that other high priority workflows and processes can continue with
adequate resources, thereby maintaining the performance of the system.
v To enable QOS, type mmchqos <fs> --enable.
v Set the qos_iops_target parameter in the spectrum-scale-objectizer.conf file. The following example is
on one line:
mmobj config change --ccrfile spectrum-scale-objectizer.conf --section DEFAULT --property
qos_iops_target --value 400
v To disable QOS on ibmobjectizer, set the qos_iops_target to 0. The following example is on one line:
mmobj config change --ccrfile spectrum-scale-objectizer.conf --section DEFAULT --property
qos_iops_target --value 0
Configuring authentication and setting identity management modes for unified file
and object access
You can configure authentication and set the identity management modes for unified file and object
access using the following steps.
The identity management modes for unified file and object access are set in the object-server-sof.conf
file. The default mode is local_mode.
Note: It is important to understand the identity management modes for unified file and object access and
set the mode you want accordingly. Although it is possible to move from one mode to another, some
considerations apply in that scenario.
Important: If you are using unified_mode, the authentication for both file and object access must be
configured and the authentication schemes must be the same and configured with the same server. If not,
the request to create object might fail with user not found error.
Use the following steps on a protocol node to configure authentication and enable unified_mode.
1. Determine which authentication scheme best suits your requirements. You can use either LDAP or AD
with UNIX-mapped domains.
Note: Because object can be configured with only one AD domain, you need to plan which of the
UNIX-mapped AD domains, in case there are trusted domains, is to be configured for object.
2. Configure file access using the mmuserauth command as follows.
mmuserauth service create --data-access-method file
--type ad --servers myADserver --idmap-role master
--netbios-name scale --unixmap-domains ’DOMAIN(5000-20000)’
3. Configure object access using the mmuserauth command as follows.
mmuserauth service create --data-access-method object –type ad
--user-name "cn=Administrator,cn=Users,dc=IBM,dc=local"
--base-dn "dc=IBM,DC=local" --ks-dns-name c40bbc2xn3 --ks-admin-user admin
--servers myADserver --user-id-attrib cn --user-name-attrib sAMAccountName
--user-objectclass organizationalPerson --user-dn "cn=Users,dc=IBM,dc=local"
--ks-swift-user swift
4. Change id_mgmt in the object-server-sof.conf file using the mmobj config change command as
follows.
mmobj config change --ccrfile object-server-sof.conf --section DEFAULT
--property id_mgmt --value unified_mode
5. If object authentication is configured with AD, set ad_domain in the object-server-sof.conf file.
mmobj config change --ccrfile object-server-sof.conf --section DEFAULT
--property ad_domain --value POLLUX
...
Forest: pollux.com
Domain: pollux.com
Domain Controller: win2k8.pollux.com
Pre-Win2k Domain: POLLUX
Pre-Win2k Hostname: WIN2K8
Server Site Name : Default-First-Site-Name
Client Site Name : Default-First-Site-Name
...
Your unified file and object access enabled fileset is now configured with unified_mode.
6. List the currently configured id_mgmt mode using the mmobj config list command as follows.
mmobj config list --ccrfile object-server-sof.conf --section DEFAULT --property id_mgmt
Important:
3. Start using one of these storage policies to create data in a unified file and object access environment.
For more information, see the following:
v “Associating containers with unified file and object access storage policy” on page 273
You must create export at the container level. From NFS or SMB, if you create a peer container, base
containers created from NFS and SMB cannot be multiprotocol.
Associating containers with unified file and object access storage policy
Use the following steps to associate a container with a unified file and object access storage policy.
1. Export common environment variables by sourcing the openrc file:
source ~/openrc
2. Associate a container with a unified file and object access storage policy using the following
command.
swift post container1 --header "X-Storage-Policy: sof-policy1"
In this swift post example, the storage policy is specified with the customized header
X-Storage-Policy using the --header option.
3. Upload an object in the container associated with the unified file and object access storage policy
using the following command.
swift upload container1 imageA.JPG
Note: The steps performed using swift commands can also be done using curl. For more information,
see “curl commands for unified file and object access related user tasks” on page 280.
Creating exports on container associated with unified file and object access
storage policy
Use the following steps to create an NFS or SMB export on the directory that maps to the container
associated with the unified file and object access storage policy.
Create an SMB or NFS export on the directory that maps to the container associated with the unified file
and object access storage policy.
1. Create the NFS export as follows:
mmnfs export add "/ibm/gpfs0/obj_sofpolicy1/s69931509221z1device1/AUTH_763476384728498323747/cont"
2. Create the SMB share as follows:
mmsmb export add smbexport "/ibm/gpfs0/obj_sofpolicy1/s69931509221z1device1/AUTH_763476384728498323747/cont"
Note:
v It is strongly recommended that you create file exports on or below the container path level and not
above it. Creating file exports above the container path level might lead to deletion of the unified file
and object access enabled containers which is undesirable.
v When using POSIX interface, it is strongly recommended to only allow access of data to POSIX users
from on or below the container path. Accidental deletion of container or data above might lead to
inconsistent state of the system.
In a unified file and object access environment, if you want to access files created from file interfaces such
as POSIX, NFS, or CIFS through object interfaces such as curl or swift, you need to make these files
available for the object interface. For making these files available for the object interface, the
ibmobjectizer service, once activated, runs periodically and makes newly created files available for the
The purpose of this command is to make certain files available to object sooner (or immediately) than
when the objectizer would have made them available. This command does not ensure synchronization
between file and object data. Therefore, files deleted are not immediately reflected in the object interface.
Complete synchronization is done by the ibmobjectizer service eventually.
In unified file and object access enabled filesets, files can be accessed from the object interface if you
know the entire URI (including keystone account ID, device etc.) to access that file without the need for
them to be objectized either using the ibmobjectizer service or the mmobj file-access command.
For more information about the mmobj file-access command, see mmobj command in the IBM Spectrum
Scale: Command and Programming Reference.
Before you can use the following steps, IBM Spectrum Scale for object storage must be installed.
This example provides a quick reference of steps performed for unified file and object access. For detailed
information about these steps, see “Administering unified file and object access” on page 269.
1. Enable the file-access object capability as follows.
mmobj file-access enable
2. [Optional] Change the objectizer service interval as follows.
mmobj config change --ccrfile spectrum-scale-objectizer.conf \
--section DEFAULT --property objectization_interval --value 600
# echo $FILE_EXPORT_PATH
/ibm/gpfs0/obj_SwiftOnFileFS/s10041510210z1device1/
AUTH_09271462d54b472c82adecff17217586/unified_access
8. Create an SMB share on the path as follows.
mmsmb export add unified_access $FILE_EXPORT_PATH
The system displays output similar to the following:
mmsmb export add: The SMB export was created successfully
9. Create an NFS export on the path.
mmnfs export add $FILE_EXPORT_PATH --client \
"*(Access_Type=RW,Squash=no_root_squash,SecType=sys)"
The system displays output similar to the following:
192.0.2.2: Redirecting to /bin/systemctl stop nfs-ganesha.service
192.0.2.3: Redirecting to /bin/systemctl stop nfs-ganesha.service
192.0.2.2: Redirecting to /bin/systemctl start nfs-ganesha.service
192.0.2.3: Redirecting to /bin/systemctl start nfs-ganesha.service
NFS Configuration successfully changed. NFS server restarted on all NFS nodes.
Note: If this is the first NFS export added to the configuration, the NFS service will be restarted on
the CES nodes where the NFS server is running. Otherwise, no NFS restart is required when adding
an NFS export.
10. Check the NFS and SMB shares.
mmnfs export list
The system displays an output similar to the following:
Path Delegations Clients
----------------------------------------------------------------------------
/ibm/gpfs0/obj_SwiftOnFileFS/
s10041510210z1device1/
AUTH_09271462d54b472c82adecff17217586/unified_access none *
mmsmb export list
Information:
The following options are not displayed because they do not contain a value:
"browseable"
mmgetacl /ibm/gpfs0/obj_SwiftOnFileFS/s10041510210z1device1/
AUTH_09271462d54b472c82adecff17217586/unified_access/DirCreatedFromSMB
#NFSv4 ACL
#owner:ADDOMAINX\administrator
#group:ADDOMAINX\domain users
special:owner@:rwxc:allow
(X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE
(X)READ_ACL (X)READ_ATTR (X)READ_NAMED
(-)DELETE (X)DELETE_CHILD (X)CHOWN
(X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED
special:group@:r-x-:allow
(X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE
(X)READ_ACL (X)READ_ATTR (X)READ_NAMED
(-)DELETE (-)DELETE_CHILD (-)CHOWN
(X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED
special:everyone@:r-x-:allow
(X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE
(X)READ_ACL (X)READ_ATTR (X)READ_NAMED
(-)DELETE (-)DELETE_CHILD (-)CHOWN
(X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED
You can view the container and the file created from the REST interface and retention of ownership
in the PUT operation as follows.
ls -l /ibm/gpfs0/obj_SwiftOnFileFS/s10041510210z1device1/
AUTH_09271462d54b472c82adecff17217586/unified_access/DirCreatedFromSMB/File2.txt
Note: The steps performed using swift commands can also be done using curl. For more information,
see “curl commands for unified file and object access related user tasks” on page 280.
Unified file and object access is one of the key features of IBM Spectrum Scale for object storage that
enables direct object access as files from the traditional file access such as POSIX, NFS or SMB and vice
versa. Using object storage policies for containers you can have object ingested in IBM Spectrum Scale for
object storage be accessed as files as well as allow files ingested using file protocols available for object
The following diagram shows an IBM Spectrum Scale object store with unified file and object access. The
object data is available as file on the same fileset. IBM Spectrum Scale Hadoop connectors allow the data
to be directly leveraged for analytics.
In-place analytics
Object (HTTP)
Container
bl1adm104
IBM Spectrum Scale file system
Note: An incorrect ETag is corrected when a GET or HEAD request is performed on the object.
v The IBM Spectrum Scale ILM policy rules work with file-extended attributes, and rules can be easily
created based on extended attributes and their values. However, these rules do not work directly over
Swift user-defined metadata. All of Swift user-defined metadata is stored in a single extended attribute
in the IBM Spectrum Scale file system. To create ILM rules, the format and sequence in which the
attributes are stored must be noted. Rules can then be created by constructing wildcard-based filters.
v SELinux must be in the Permissive or Disabled mode to enable object access for the existing filesets.
v The conditional request, such as If-Match and If-None-Match when used with swift or curl client that
performs ETag comparison does not work for the existing data that is enabled for object access by
using the mmobj file-access link-fileset command. If the --update-listing option is used, the
feature can be used after the objectizer service interval.
v If the swift and curl clients report successful container deletion after a delete operation is triggered
on a container that contains linked filesets, then the directory corresponding to the container and the
symlinks of the linked filesets are not deleted and must be deleted manually.
v The swift COPY API (when used on linked fileset) does not copy the object metadata. The swift POST
API should be used instead.
The swift constraints listed in the following table are also applicable to unified file and object access.
Table 22. Configuration options for [swift-constraints] in swift.conf
Option Limit
MAX_FILE_SIZE 5497558138880 (5 TiB)
MAX_META_NAME_LENGTH 128
MAX_META_VALUE_LENGTH 256
MAX_META_COUNT 90
MAX_META_OVERALL_SIZE 4096
MAX_HEADER_SIZE 8192
CONTAINER_LISTING_LIMIT 10000
ACCOUNT_LISTING_LIMIT 10000
MAX_ACCOUNT_NAME_LENGTH 256
VALID_API_VERSIONS ["v1", "v1.0"]
EXTRA_HEADER_COUNT 0
Note: These values can be changed by using mmobj config change command for the swift.conf file in
swift-constraints section.
In the following data ingestion example steps performed using curl, this setup is assumed:
v User: "fileuser"
v Password: "Password6"
v Account: "admin"
v Host: specscaleswift.example.com
1. Obtain the auth token using the following command:
curl -s -i -H "Content-Type: application/json"
-d ’{ "auth": {"identity": {"methods": ["password"],"password": {"user": {"name": "fileuser","domain":
{ "name": "Default" },"password": "Passw0rd6"}}},"scope": {"project": {"name": "admin","domain": { "name": "Default" }}}}}’
http://specscaleswift.example.com:35357/v3/auth/tokens
The auth token obtained in this step must be stored in the $AUTH_TOKEN variable.
2. Obtain the project list using the following command:
Chapter 19. Managing object storage 279
curl -s -H "X-Auth-Token: $AUTH_TOKEN" http://specscaleswift.example.com:35357/v3/projects
The project ID obtained in this step must be stored in the $AUTH_ID variable.
3. Perform a PUT operation using the following command:
curl -i -s -X PUT --data @/tmp/file.txt -H "X-Auth-Token:
$AUTH_TOKEN" "http://specscaleswift.example.com:8080/v1/AUTH_$AUTH_ID/RootLevelContainer/TestObj.txt"
This command uploads the /tmp/file.txt file.
4. Set up the metadata age of the uploaded object using the following command:
curl -i -s -X POST -H "X-Auth-Token: $AUTH_TOKEN" -H
X-Container-Meta-Age:21 http://specscaleswift.example.com:8080/v1/AUTH_$AUTH_ID/RootLevelContainer/TestObj.txt
5. Read the metadata using the following command:
curl -i -s --head -H "X-Auth-Token: $AUTH_TOKEN"
http://specscaleswift.example.com:8080/v1/AUTH_$AUTH_ID/RootLevelContainer/TestObj.txt
curl commands for unified file and object access related user tasks
Use the following curl commands to perform user tasks related to unified file and object access.
For information on changing an option in a configuration file, see “Changing options in configuration
files” on page 284.
When you set the retain_* options to yes, the following attributes are retained:
v The extended attributes in the user namespace except for the user.swift.metadata key which contains
swift metadata and it is expected to be new.
v Windows attributes
When you set the retain_* options to yes, the following attributes are not retained:
v Extended attributes in system, security, and trusted namespaces.
Note: These attributes are not retained in an object's copy object operation also.
Retaining ACLs, Windows attributes, file extended attributes, and ownership, when an object is PUT over
an existing object in unified file and object access enabled containers depends on your specific use case
and your discretion. For example, if you are using object and file access to refer to the same data content
in such a way that the object protocol might completely replace the data content in such that it might be
completely new content from the file interface as well, then you might choose to not retain the existing
file ACL and extended attributes. For such a use case, you might change the default values to not to
retain the file ACLs, extended attributes, and ownership.
Note: If you are unsure about whether to retain these attributes or not, you might want to use the
default values of retaining ACLs, Windows attributes, file extended attributes, and ownership. The
default values in this case are more aligned with the expected behavior in a multiprotocol setup.
spectrum-scale-object.conf
v Contains cluster or fileset configuration information
v Unique to a site
Table 24. Configurable options for [capabilities] in spectrum-scale-object.conf
Configuration option = Default value Description
file-access-enabled = false The state for the file-access capability. It can be either
true or false.
multi-region-enabled = true The state for the multi-region capability. This option
cannot be changed.
s3-enabled = true The state for the s3 capability. This option cannot be
changed.
spectrum-scale-objectizer.conf
v
Contains the ibmobjectizer service configuration information
Table 25. Configuration options for [DEFAULT] in spectrum-scale-objectizer.conf.
Configuration option = Default value Description
objectization_tmp_dir The temporary directory to be used by ibmobjectizer.
This must be a path on any GPFS file system. The
default value is autofilled with the path of the base file
system for object.
objectization_threads = 24 The maximum number of threads that ibmobjectizer will
spawn on a node.
batch_size = 100 The maximum number of files that ibmobjectizer will
process in a thread.
0 means infinite.
object-server.conf file
v
Used to set swift timeout values on the lock_path calls to handle GPFS delays better
Table 27. Configuration options for object-server.conf
Configuration option = Default value Description
partition_lock_timeout = 10 The time-out value while the object server tries to
acquire a lock on the partition path during object create.
update, and delete processes. The default value is 10.
/etc/sysconfig/memcached file
v Used to improve the performance of the internal lookups in the framework
Table 28. Configuration options for /etc/sysconfig/memcached
Configuration option = Default value Description
MAXCONN = 4096 The value is set to 4096 unless the current value is higher
than 4096.
CACHESIZE = 2048 The value is set to 2048 unless the current value is higher
than 2048.
proxy-server.conf file
v Used to improve the performance of the internal lookups in the framework
Table 29. Configuration options for proxy-server.conf
Configuration option = Default value Description
memcache_max_connections = 8 The default value is set to 8.
memcache_servers This parameter is dynamically set to the nodes that are
running the object protocol for memcache to work in a
clustered environment.
You can use the mmobj config change command to change the values of the options in the configuration
files. For example:
v Change the value of an option in the [DEFAULT] section of the object-server-sof.conf file as follows:
mmobj config change --ccrfile object-server-sof.conf --section DEFAULT
--property OPTIONNAME --value NEWVALUE
v Change the value of an option in the [IBMOBJECTIZER-LOGGER] section of the spectrum-scale-
objectizer.conf file as follows:
mmobj config change --ccrfile spectrum-scale-objectizer.conf --section IBMOBJECTIZER-LOGGER
--property OPTIONNAME --value NEWVALUE
Note: Only some options are configurable. If an option cannot be changed, it is mentioned in the
respective option description.
Attention: When a configuration file is changed using these commands, it takes several seconds for the
changes to be synced across the whole cluster depending on the size of the cluster. Therefore, when
executing multiple commands to change configuration files, you must plan for an adequate time interval
between the execution of these commands.
In the examples, the steps to back up the Keystone configuration files and database are not given. That is
the user's responsibility. You can use OpenStack backup procedures for this task. For more information on
OpenStack backup procedures, see Chapter 14. Backup and Recovery.
Note:
v The same version of the IBM Spectrum Protect backup-archive client must be installed on all of the
nodes that are running the mmbackup command.
v For more information on IBM Spectrum Protect requirements for the mmbackup command, see IBM
Spectrum Scale requirements in IBM Spectrum Scale: Administration Guide.
Store all relevant cluster and file system configuration data in a safe location outside your GPFS cluster
environment. This data is essential to restoring your object storage quickly, so you might want to store it
in a site in a different geographical location for added safety.
Note: The sample file system used throughout this procedure is called smallfs. Replace this value with
your file system name wherever necessary.
1. Back up the cluster configuration information.
The cluster configuration must be backed up by the administrator. The following cluster configuration
information is necessary for the backup:
v IP addresses
where
-N Specifies the nodes that are involved in the backup process. These nodes must be
configured for the IBM Spectrum Protect server that is being used.
-S Specifies the global snapshot name to be used for the backup.
--tsm-servers
Specifies which IBM Spectrum Protect server is used as the backup target, as specified in
the IBM Spectrum Protect client configuration dsm.sys file.
There are several other parameters available for the mmbackup command that influence the backup
process, and the speed with which its handles the system load. For example, you can increase the
number of backup threads per node by using the -m parameter. For the full list of parameters
available, see the mmbackup command in the IBM Spectrum Scale: Command and Programming
Reference.
d. Issue the following command to remove the snapshot that was created in step 6a:
mmdelsnapshot <file system device> <snapshot name>
For example:
mmdelsnapshot smallfs objects_globalsnap1
Note: All IBM Spectrum Scale object nodes and IBM Spectrum Protect client nodes must be available
when the object storage configuration and contents are being restored.
After you perform the prerequisite procedures, you can begin the recovery procedure.
Note: The sample file system that is used throughout this procedure is called smallfs. Replace this value
with your file system name wherever necessary.
1. Retrieve the base file system configuration information.
Use the mmrestoreconfig command to generate a configuration file that contains the details of the
former file system. For example:
mmrestoreconfig smallfs -i /tmp/smallfs.bkpcfg.out925 -F
smallfsQueryResultFile
2. Re-create the NSDs if they are missing.
Using the output file that is generated in the previous step as a guide, the administrator might need
to re-create NSD devices for use with the restored file system. In the output file, the NSD
configuration section contains the NSD information. For example:
######## NSD configuration ##################
## Disk descriptor format for the mmcrnsd command.
## Please edit the disk and desired name fields to match
## your current hardware settings.
##
## The user then can uncomment the descriptor lines and
## use this file as input to the -F option.
#
# %nsd:
# device=DiskName
# nsd=nsd8
# usage=dataAndMetadata
# failureGroup=-1
# pool=system
#
If changes are needed, edit the file in a text editor and follow the included instructions to use it as
input for the mmcrnsd command, then issue the following command:
mmcrnsd -F StanzaFile
3. Re-create the base file system.
The administrator must re-create the initial file system. The output query file created in step 1 can be
used as a guide. The following example shows the section of this file that is needed when re-creating
the file system:
######### File system configuration #############
## The user can use the predefined options/option values
## when recreating the file system. The option values
## represent values from the backed up file system.
#
# mmcrfs FS_NAME NSD_DISKS -j cluster -k posix -Q yes -L 4194304 --disable-fastea
-T /smallfs -A no --inode-limit 278016#
4. Restore the essential file system configuration.
The essential file system configuration can be restored to the file system that was created in the
previous step by using the mmrestoreconfig command. For example:
mmrestoreconfig smallfs -i /tmp/smallfs.bkpcfg.out925
5. Issue the following command to mount the object file system on all nodes:
mmmount <file system device> -a
For example, mount the file system with the following command:
mmmount smallfs -a
You can create separate restore jobs by splitting a single restore task into several smaller ones. One way
to do this is to specify the restore path for the object data that is deeper in the IBM Spectrum Scale object
path.
For example, instead of starting the restore with the root of the IBM Spectrum Scale object path, start the
object restore at the virtual devices level. If you have 40 virtual devices that are configured, you might
start 40 independent restore jobs to restore the object data, and distribute the jobs to the different IBM
Spectrum Protect client nodes. Additionally, you start a single restore job for all of the files under the
account and container path.
With this approach, care must be taken not to overload the IBM Spectrum Protect client nodes or the IBM
Spectrum Protect server. You might want to experiment to determine the best mix of jobs.
For example, if there are four IBM Spectrum Scale object nodes, each with the IBM Spectrum Protect
client installed and configured, you might use the following types of commands:
1. On the first IBM Spectrum Scale object node, run a restore job for each of the first 10 virtual devices
by running the following commands:
dsmc restore /gpfs0/objectfs/o/z1device0 -subdir=yes -disablenqr=no \
-servername=tsm1
dsmc restore /gpfs0/objectfs/o/z1device1 -subdir=yes -disablenqr=no \
-servername=tsm1
#<repeat for z1device2 - z1device9>
2. On the second node, run a restore job for each of the next 10 virtual devices. Continue the pattern on
the remaining IBM Spectrum Scale object nodes so that all the virtual devices under the o
The most efficient restore approach depends on many factors, including the number of tape drives, IBM
Spectrum Protect client configuration, and network bandwidth. You might need to experiment with your
configuration to determine the most optimal restore strategy.
Object needs constant network access between all the configured CES IP addresses. The standard
configuration uses all the available CES IP addresses.
LAN1
bl1adm002
If a cluster configuration has isolated node and network groups and CES IP addresses have been
assigned to those groups, parts of the object store are not accessible.
In such a configuration, a network and node group that must be used for object can be configured in the
spectrum-scale-object.conf file. Only the CES IP addresses of this group are used for object.
Note: If the Singleton and Database attributes of the IP assignments are changed manually by using the
mmces address change command, only the IP addresses that belong to the object group can be used.
Configuration example
In the following example, LAN1 is used as the object group, IP1, IP2, and IP3 are used for object, and the
object store is fully available. If only LAN1 is used, the used object services will be limited to Node1,
Node2, and Node3. To distribute the object load to all the nodes in the cluster, create a network that
bl1adm004
LAN4
1. To set up the object group and create the LAN4 group by adding nodes to groups, run the following
command:
mmchnode --ces-group=LAN4 -N Node1,Node2,Node3,Node4,Node5,Node6,Node7
If you want to add CES IP addresses to the group, run the following command:
mmces address add --ces-ip ces_ip8,ces_ip9,ces_ip10,ces_ip11,
ces_ip12,ces_ip13,ces_ip14 --ces-group LAN4
If you want to move the existing CES IP addresses to the group, run the following command:
mmces address change --ces-ip ces_ip8,ces_ip9,ces_ip10,ces_ip11,
ces_ip12,ces_ip13,ces_ip14 --ces-group LAN4
2. To set up the object group when object has already been configured, run the following command. The
following command is on one line:
mmobj config change --ccrfile spectrum-scale-object.conf --section node-group
--property object-node-group --value LAN4
Note:
The temporary files generated by Swift are not moved between storage tiers because they are all
eventually replaced with permanent files that have the .data extension. Moving temporary files to
system, gold, or silver storage pools results in unnecessary data movement.
2. To enable the object heatmap policy for unified file and object access, identify the filename prefix for
temporary files created by Swift in unified file and object access. The filename prefix is configured in
object-server-sof.conf and can be fetched:
grep tempfile_prefix /etc/swift/object-server-sof.conf
tempfile_prefix = .ibmtmp_
3. Determine the filesets that are enabled for unified file and object access:
mmobj policy list
Index Name Default Deprecated Fileset Functions
----------------------------------------------------------------------------------------
0 SwiftDefault yes obj_fset
1317160 Sof obj_Sof file-and-object-access
4. Create a heat based migration rule by creating the following in a file:
RULE ’DefineTiers’ GROUP POOL ’TIERS’
IS ’system’ LIMIT(70)
THEN ’gold’ LIMIT(75)
THEN ’silver’
RULE ’Rebalance’ MIGRATE FROM POOL ’TIERS’
TO POOL ’TIERS’ WEIGHT(FILE_HEAT)
FOR FILESET(’obj_Sof’)
WHERE NAME NOT LIKE ’.ibmtmp_%’
Note:
v The fileset name is derived from Step 3. Multiple fileset names can be separated by comma.
v The filename prefix in the WHERE clause is derived from Step 2. By using this filter, the migration of
temporary files is skipped, thereby avoiding unnecessary data movement.
5. To test the policy, run the following command:
mmapplypolicy fs1 -P object_heat_policy -I test
[I] GPFS Current Data Pool Utilization in KB and %
Pool_Name KB_Occupied KB_Total Percent_Occupied
gold 169462784 6836715520 2.478716330%
silver 136192 13673431040 0.000996034%
system 8990720 13673431040 0.065753211%
[I] 6050 of 42706176 inodes used: 0.014167%.[I] Loaded policy rules from object_heat_policy.
Evaluating policy rules with CURRENT_TIMESTAMP = 2015-11-22@02:30:19 UTC
Parsed 2 policy rules.
RULE ’DefineTiers’ GROUP POOL ’TIERS’ IS ’system’ LIMIT(70)THEN ’gold’ LIMIT(75)THEN ’silver’
RULE ’Rebalance’ MIGRATE FROM POOL ’TIERS’ TO POOL ’TIERS’ WEIGHT(computeFileHeat
(CURRENT_TIMESTAMP-ACCESS_TIME,xattr(’gpfs.FileHeat’), KB_ALLOCATED))FOR FILESET(’Object_Fileset’)
WHERE NAME LIKE ’%.data’
[I] 2015-11-22@02:30:20.045 Directory entries scanned: 1945.
[I] Directories scan: 1223 files, 594 directories, 128 other objects, 0 ’skipped’ files and/or errors.
[I] 2015-11-22@02:30:20.050 Sorting 1945 file list records.
[I] Inodes scan: 1223 files, 594 directories, 128 other objects, 0 ’skipped’files and/or errors.
[I] 2015-11-22@02:30:20.345 Policy evaluation. 1945 files scanned.
[I] 2015-11-22@02:30:20.350 Sorting 1 candidate file list records.
[I] 2015-11-22@02:30:20.437 Choosing candidate files. 1 records scanned.
[I] Summary of Rule Applicability and File Choices:
Rule# Hit_Cnt KB_Hit Chosen KB_Chosen KB_Ill Rule
0 98080572 3353660160 39384107 1328389120 292416 RULE ’Clean’
MIGRATE FROM POOL ’TIERS’ WEIGHT(.) TO POOL ’TIERS’ FOR FILESET(.) WHERE(.)
Quotas are enabled by the system administrator when control over the amount of space used by the
individual users, groups of users, or individual filesets is required. By default, user and group quota
limits are enforced across the entire file system. Optionally, the scope of quota enforcement can be limited
to an individual fileset boundaries.
Note: A large number of quota records per file system can result from the following scenarios:
v There are a very large number of users, groups, or filesets.
v If the --perfileset-quota option is enabled, the number of possible quota records is the number of
filesets times number of users (and groups).
GUI navigation
To work with this function in the GUI, log on to the IBM Spectrum Scale GUI and select Files > Quotas.
Note: Windows nodes may be included in clusters that use GPFS quotas. However, Windows nodes do
not support the quota commands.
Once GPFS quota management has been enabled, you may establish quota values by:
v Setting default quotas for all new users, groups of users, or filesets.
v Explicitly establishing or changing quotas for users, groups of users, or filesets.
v Using the gpfs_quotactl() subroutine.
To disable quota management, run the mmchfs -Q no command. All subsequent mounts will obey the
new quota setting.
For complete usage information, see the mmcheckquota command, the mmchfs command, the mmcrfs
command, and the mmedquota command in the IBM Spectrum Scale: Command and Programming Reference. For
additional information on quotas, see the IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
Default quotas
Default quota limits can be set for new users, groups, and filesets for a specified file system. Default
quota limits can also be applied at a more granular level for new users and groups in a specified fileset.
When default quotas are managed at the fileset level, those quotas have a higher priority than those set
at the file system level. If the status of the fileset-level defaults for one fileset is Initial, they will inherit
default limits from global fileset-level defaults. The status of newly added fileset-level default quotas can
be one of the following:
Initial When the fileset is created, it will be in this state. All user and group quota accounts under the
fileset will not follow the fileset defaults.
Quota on
All user and group quota accounts under the fileset that are created later will follow the fileset
quota limits.
Note: Although GPFS quotas do not explicitly interact with Swift quotas, it still might be useful to
employ GPFS quotas to limit the amount of space or the number of inodes that is consumed by the
object store. To do this, define GPFS quotas on the top-level independent fileset by specifying the
maximum size or maximum inode usage that the object store can consume. See Chapter 6 Swift
feature overview of the IBM Redpaper for details.
Note: If --perfileset-quota is in effect, all users and groups in the fileset root will not be impacted
by default quota unless they are explicitly set.
The -Q yes and --perfileset-quota options are specified when creating a file system with the mmcrfs
command or changing file system attributes with the mmchfs command. Use the mmlsfs command to
display the current settings of these quota options.
2. Enable default quotas with the mmdefquotaon command.
For complete usage information, see the mmchfs command, mmcrfs command, mmdefedquota command,
mmdefquotaoff command, mmdefquotaon command, mmedquota command, mmlsfs command, and mmsetquota
command in the IBM Spectrum Scale: Command and Programming Reference.
Quotas are stored and enforced in the file system. See Chapter 20, “Managing GPFS quotas,” on page 293
for details on how to enable and use quotas.
v SMB protocol and quotas
For SMB clients, quotas can limit the used and free space reported to clients:
– If the SMB option gpfs:dfreequota is set, the user quota for the current user and the group quota for
the user's primary group are queried during the free space query:
Note: For including fileset quotas in the reported free space, configure the underlying file system
with the --filesetdf flag (in mmcrfs or mmchfs). It is not possible to query or change individual
quotas from a SMB client system.
v NFS protocol and quotas
It is not possible to query or change individual quotas from a NFS client system. User and group
quotas are not included in the reported free space to a client. To include fileset quotas in the reported
space to a client, configure the underlying file system with the --filesetdf flag (in mmcrfs or mmchfs).
v Object protocol and quotas
– There are two applicable levels of quotas: The quotas in the file system (see Chapter 20, “Managing
GPFS quotas,” on page 293) and the quotas managed by Swift.
– Swift quotas can be used as account and container quotas. The quota values are set as account and
container metadata entries. For more information, see the OpenStack Swift documentation.
– When using file system quotas, it is important to consider that all objects stored by Swift are stored
with the same owner and owning group (swift:swift).
When setting quota limits for a file system, replication within the file system should be considered. See
“Listing quotas” on page 301.
The mmedquota command opens a session using your default editor, and prompts you for soft and hard
limits for blocks and inodes. For example, to set user quotas for user jesmith, enter:
mmedquota -u jesmith
Note: A quota limit of zero indicates no quota limits have been established.
The current (in use) block and inode usage is for display only; it cannot be changed. When establishing a
new quota, zeros appear as limits. Replace the zeros, or old values if you are changing existing limits,
with values based on the user's needs and the resources available. When you close the editor, GPFS
checks the values and applies them. If an invalid value is specified, GPFS generates an error message. If
this occurs, reenter the mmedquota command. If the scope of quota limit enforcement is the entire file
system, mmedquota will list all instances of the same user (for example, jesmith) on different GPFS file
You may find it helpful to maintain a quota prototype, a set of limits that you can apply by name to any
user, group, or fileset without entering the individual values manually. This makes it easy to set the same
limits for all. The mmedquota command includes the -p option for naming a prototypical user, group, or
fileset on which limits are to be based. The -p flag can only be used to propagate quotas from filesets
within the same file system.
For example, to set group quotas for all users in a group named blueteam to the prototypical values
established for prototeam, issue:
mmedquota -g -p prototeam blueteam
You may also reestablish default quotas for a specified user, group of users, or fileset when using the -d
option on the mmedquota command.
Note: You can use the mmsetquota command as an alternative to the mmedquota command.
For complete usage information, see the mmedquota command and the mmsetquota command in the IBM
Spectrum Scale: Command and Programming Reference.
Note: If --perfileset-quota is in effect, all users and groups in the fileset root will not be impacted
by default quota unless they are explicitly set.
The -Q yes and --perfileset-quota options are specified when creating a file system with the mmcrfs
command or changing file system attributes with the mmchfs command. Use the mmlsfs command to
display the current settings of these quota options.
Here are some examples:
a. A GPFS cluster is created with configuration profile file, example.profile, which contains the
following lines:
%filesystem
quotasAccountingEnabled=yes
quotasEnforced=user;group;fileset
perfilesetQuotas=yes
When a file system is created, those quota attributes will be set automatically. Quota accounting
will be enabled on a perfileset basis for users and groups, and quotas will automatically be
enforced. This means that when a quota is reached, the end user will not be able to add more data
to the file system.
mmcrfs fs5 nsd8
A listing of the file system config, using the mmlsfs command, will show the following attributes
and values, having been set by the mmcrfs command:
mmlsfs fs5
Checking quotas
The mmcheckquota command counts inode and space usage for a file system and writes the collected
data into quota files.
You must use the mmcheckquota command if any of the following are true:
1. Quota information is lost due to node failure.
Node failure could leave users unable to open files or deny them disk space that their quotas should
allow.
2. The in doubt value approaches the quota limit. To see the in doubt value, use the mmlsquota or
mmrepquota commands.
As the sum of the in doubt value and the current usage may not exceed the hard limit, the actual
block space and number of files available to the user, group, or fileset may be constrained by the in
doubt value. Should the in doubt value approach a significant percentage of the quota, use the
mmcheckquota command to account for the lost space and files.
Note: Running mmcheckquota is also recommended (in an appropriate time slot) if the following
message is displayed after running mmrepquota, mmlsquota, or mmedquota:
Quota accounting information is inaccurate and quotacheck must be run.
When issuing the mmcheckquota command on a mounted file system, negative in doubt values may be
reported if the quota server processes a combination of up-to-date and back-level information. This is a
transient situation and may be ignored.
For example, to check quotas for the file system fs1 and report differences between calculated and
recorded disk quotas, enter:
mmcheckquota -v fs1
The information displayed shows that the quota information for USR7 was corrected. Due to a system
failure, this information was lost at the server, which recorded 0 subblocks and 0 files. The current usage
data counted is 96 subblocks and 3 files. This is used to update the quota:
fs1: quota check found the following differences:
USR7: 96 subblocks counted (was 0); 3 inodes counted (was 0)
Note: In cases where small files do not have an additional block allocated for them, quota usage may
show less space usage than expected.
For complete usage information, see the mmcheckquota command in the IBM Spectrum Scale: Command and
Programming Reference.
Listing quotas
The mmlsquota command displays the file system quota limits, default quota limits, and current usage
information.
If the scope of quota limit enforcement is the entire file system, mmlsquota -u or mmlsquota -g will list
all instances of the same user or group on different GPFS file systems. If the quota enforcement is on a
per-fileset basis, mmlsquota -u or mmlsquota -g will list all instances of the same user or group on
different filesets on different GPFS file systems.
GPFS quota management takes replication into account when reporting on and determining whether
quota limits have been exceeded for both block and file usage. If either data or metadata replication is
enabled, the values reported by both the mmlsquota command and the mmrepquota command may
exceed the corresponding values reported by commands like ls, du, and so on. The difference depends on
the level of replication and on the number of replicated file system objects. For example, if data block
replication is set to 2, and if all files are replicated, then the reported block usage by the mmlsquota and
mmrepquota commands will be double the usage reported by the ls command.
When the mmlsquota command is issued, negative in doubt values may be reported if the quota server
processes a combination of up-to-date and back-level information. This is a transient situation and may
be ignored.
Display the quota information for one user, group of users, or fileset with the mmlsquota command. If
none of the options -g, -u, or -j are specified, the default is to display only user quotas for the user who
issues the command.
To display default quota information, use the -d option with the mmlsquota command. For example, to
display default quota information for users of all the file systems in the cluster, issue this command:
mmlsquota -d -u
In this example, file system fs1 shows that the default block quota for users is set at 5 GB for the soft
limit and 6 GB for the hard limit. For file system fs2, no default quotas for users have been established.
When mmlsquota -d is specified in combination with the -u, -g, or -j options, default file system quotas
are displayed. When mmlsquota -d is specified without any of the -u, -g, or -j options, default
fileset-level quotas are displayed.
If you issue the mmlsquota command with the -e option, the quota system collects updated information
from all nodes before returning output. If the node to which in-doubt space was allocated should fail
before updating the quota system about its actual usage, this space might be lost. Should the amount of
space in doubt approach a significant percentage of the quota, run the mmcheckquota command to
account for the lost space.
To collect and display updated quota information about a group named blueteam, specify the -g and -e
options:
mmlsquota -g blueteam -e
For complete usage information, see the mmlsquota command in the IBM Spectrum Scale: Command and
Programming Reference.
You can have quotas activated automatically whenever the file system is mounted by specifying the
quota option (-Q yes) when creating (mmcrfs -Q yes) or changing (mmchfs -Q yes) a GPFS file system.
When creating a file system, the default is to not have quotas activated, so you must specify this option if
you want quotas activated.
The mmquotaon command is used to turn quota limit checking back on if it had been deactivated by
issuing the mmquotaoff command. Specify the file system name, and whether user, group, or fileset
quotas are to be activated. If you want all three fileset quotas activated (user, group, and fileset), specify
only the file system name. After quotas have been turned back on, issue the mmcheckquota command to
count inode and space usage.
For example, to activate user quotas on the file system fs1, enter:
mmquotaon -u fs1
For complete usage information, see the mmquotaon command and the mmlsfs command in the IBM
Spectrum Scale: Command and Programming Reference.
If this occurs, use the mmcheckquota command after reactivating quotas to reconcile allocation data.
When quota enforcement is deactivated, disk space and file allocations are made without regard to limits.
The mmquotaoff command is used to deactivate quota limit checking. Specify the file system name and
whether user, group, or fileset quotas, or any combination of these three, are to be deactivated. If you
want all types of quotas deactivated, specify only the file system name.
For example, to deactivate only user quotas on the file system fs1, enter:
mmquotaoff -u fs1
For complete usage information, see the mmquotaoff command and the mmlsfs command in the IBM
Spectrum Scale: Command and Programming Reference.
The scope of quota enforcement can be changed using the mmchfs command and specifying either the
--perfileset-quota or --noperfileset-quota option as needed.
After changing the scope of quota enforcement, mmcheckquota must be run to properly update the
quota usage information.
Specify whether you want to list only user quota information (-u flag), group quota information (-g flag),
or fileset quota information (-j flag) in the mmrepquota command. The default is to summarize all three
quotas. If the -e flag is not specified, there is the potential to display negative usage values as the quota
server may process a combination of up-to-date and back-level information. See “Listing quotas” on page
301.
If the scope of quota limit enforcement is the entire file system, mmrepquota -u or mmrepquota -g will
list all users or groups on different GPFS file systems. If the quota enforcement is on a per-fileset basis,
mmrepquota -u or mmrepquota -g will list all instances of the same user or group on different filesets on
different GPFS file systems.
To list the group quotas (-g option) for all file systems in the cluster (-a option), and print a report with
header lines (-v option), enter:
mmrepquota -g -v -a
For complete usage information, see the mmrepquota command in the IBM Spectrum Scale: Command and
Programming Reference.
Additional details about the three scenarios for restoring GPFS quota files follow.
In scenarios 1 and 3:
This command must be run offline (that is, no nodes are mounted).
2. This will restore the user quota limits set for the file system, but the usage information will not be
current. To bring the usage information to current values, the command must be reissued:
mmcheckquota fs1
In scenario 1, if you want to nullify all quota configuration and then reinitialize it, follow these steps:
1. Remove the existing quota files that are corrupted.
a. Disable quota management:
mmchfs fs1 -Q no
b. Remove the user.quota, group.quota, and fileset.quota files.
2. Enable quota management.
a. Issue the following command:
mmchfs fs1 -Q yes
3. Reestablish quota limits by issuing the mmedquota command or the mmdefedquota command.
4. Gather the current quota usage values by issuing the mmcheckquota command.
In scenario 2, quota files do not exist externally. Therefore, use mmbackupconfig and mmrestoreconfig to
restore quota configurations.
For complete usage information, see the mmcheckquota command, the mmdefedquota command, and the
mmedquota command in the IBM Spectrum Scale: Command and Programming Reference.
You can manage GUI users either locally within the system or in an external authentication server such
as Microsoft Active Directory (AD) or Lightweight Directory Access Protocol Server (LDAP). By default,
the IBM Spectrum Scale system uses an internal authentication repository for GUI users. To use an
external AD or LDAP server, you need to disable the internal user repository that is used for the GUI
user management and enable the LDAP/AD repository. For more information on how to disable internal
repository and enable the external repository, see “Managing GUI users in an external AD or LDAP
server” on page 309 .
You can create users who can perform different administrative tasks on the system. Each user must be
part of a user group or multiple groups that are defined on the system. When you create a new user, you
assign the user to one of the default user groups or to a custom user group. User groups are assigned
with predefined roles that authorize the users within that group to a specific set of operations on the
GUI.
Use the Services > GUI page to create users and add them to a user group.
Predefined roles are assigned to user groups to define the working scope within the GUI. If a user is
assigned to more than one user group, the permissions are additive, not restrictive. The predefined role
names cannot be changed.
The IBM Spectrum Scale system is delivered with a default GUI user named admin. This user is also
stored in the local repository. You can log in to the system by using this user name to create additional
GUI users and groups in local user repository.
Use the various controls that are available under the Password Policy tab of the Services > GUI page to
enforce strong passwords for the users. You can modify or expire password of the individual users or all
the users that are created in the system. If the password is set as expired, the users will be prompted to
change the password in the next login.
Note: Only users with User Administrator role can modify the password policy of a user.
User groups
Users who are part of Security Administrator and User Administrator user groups can create role-based
user groups where any users that are added to the group adopt the role that is assigned to that group.
Roles apply to users on the system and are based on the user group to which the user belongs. A user
can be part of multiple user groups so that a single user can play multiple roles in the system. You can
assign the following roles to your user groups:
v Administrator
Users can access all functions on the GUI except those deals with managing users and user groups.
v Security Administrator
Users can access all functions on the GUI, including managing users and user groups.
v System Administrator
Users manage clusters, nodes, and alert logs.
v Storage Administrator
Users manage disks, file systems, pools, and filesets.
v Snapshot Administrator
Users manage snapshots for file systems, filesets.
v Monitor
Users can view objects and system configuration but cannot configure, modify, or manage the system
or its resources.
v Data Access
Users can perform the following tasks:
– Edit owner, group, and ACL of any file or path through the Access > File System ACL > Files and
Directories page.
– Edit owner, group, and ACL for a non-empty directory of a file system, fileset, NFS export, or SMB
share.
– Create or delete object containers through the Object > Accounts page.
v Protocol Administrator
Users manage object storage and data export definitions of SMB and NFS protocols.
v User Administrator
Users manage GUI users and user groups.
By default, the IBM Spectrum Scale uses an internal authentication repository for the GUI administrators.
You can configure an external authentication server by performing the following steps:
1. Create your AD or LDAP configuration by issuing the mkldap command at the following location:
/usr/lpp/mmfs/gui/cli/mkldap.
This command writes the configuration automatically to /opt/ibm/wlp/usr/servers/gpfsgui/
ldap.xml, which is then distributed across all GUI nodes. For secure AD or LDAP connection, make
sure that the keystores are present on the respective GUI nodes.
The mkldap command accepts the following parameters.
Table 30. mkldap command parameters
Parameters Description
id Unique ID of the LDAP configuration.
--host The IP address or host name of the LDAP server.
--baseDn BaseDn string for the repository.
--bindDn BindDn string for the authentication user.
--bindPassword Password of the authentication user.
--port Port number of the LDAP. Default is 389 or 636 over SSL.
--type Repository type such as ad, ids, domino, secureway, iplanet,
netscape, edirectory or custom. Default is ad.
--connectTimeout Maximum time for establishing a connection with the
LDAP server. Default value is 1m.
--searchTimeout Maximum time for an LDAP server to respond before a
request is canceled. Default value is 1m.
--keystore Location with file name of the keystore file (.jks, .p12 or
.pfx).
--keystorePassword Password of the keystore.
--truststore Location with file name of the truststore file (.jks, .p12 or
.pfx).
--truststorePassword Password of the truststore.
--userFilter User filter for the LDAP repository.
--userIdMap User ID map for the LDAP repository.
--groupFilter Group filter for the LDAP repository.
--groupIdMap Group ID map for the LDAP repository.
--groupMemberIdMap Group member ID map for the LDAP repository.
Note: Configurations managed by mkldap and rmldap commands are not overwritten during the upgrade.
That is you do not need to backup the configuration data.
Management of GPFS access control lists (ACLs) and NFS export includes these topics:
v “Traditional GPFS ACL administration”
v “NFS V4 ACL administration” on page 315
v Chapter 23, “Native NFS and GPFS,” on page 337
If you are using NFS V4 ACLs, see “NFS V4 ACL administration” on page 315. Both ACL types may
coexist in a single GPFS file system.
Traditional GPFS ACLs are based on the POSIX model. Traditional GPFS access control lists (ACLs)
extend the base permissions, or standard file access modes, of read (r), write (w), and execute (x) beyond
the three categories of file owner, file group, and other users, to allow the definition of additional users
and user groups. In addition, GPFS introduces a fourth access mode, control (c), which can be used to
govern who can manage the ACL itself.
In this way, a traditional ACL can be created that looks like this:
#owner:jesmith
#group:team_A
user::rwxc
group::rwx-
other::--x-
mask::rwxc
user:alpha:r-xc
group:audit:r-x-
group:system:rwx-
In this ACL:
v The first two lines are comments showing the file's owner, jesmith, and group name, team_A
v The next three lines contain the base permissions for the file. These three entries are the minimum
necessary for a GPFS ACL:
1. The permissions set for the file owner (user), jesmith
2. The permissions set for the owner's group, team_A
3. The permissions set for other groups or users outside the owner's group and not belonging to any
named entry
v The next line, with an entry type of mask, contains the maximum permissions allowed for any entries
other than the owner (the user entry) and those covered by other in the ACL.
v The last three lines contain additional entries for specific users and groups. These permissions are
limited by those specified in the mask entry, but you may specify any number of additional entries up
to a memory page (approximately 4 K) in size.
Each GPFS file or directory has an access ACL that determines its access privileges. These ACLs control
who is allowed to read or write at the file or directory level, as well as who is allowed to change the
ACL itself.
In addition to an access ACL, a directory may also have a default ACL. If present, the default ACL is used
as a base for the access ACL of every object created in that directory. This allows a user to protect all files
in a directory without explicitly setting an ACL for each one.
When a new object is created, and the parent directory has a default ACL, the entries of the default ACL
are copied to the new object's access ACL. After that, the base permissions for user, mask (or group if
mask is not defined), and other, are changed to their intersection with the corresponding permissions
from the mode parameter in the function that creates the object.
If the new object is a directory, its default ACL is set to the default ACL of the parent directory. If the
parent directory does not have a default ACL, the initial access ACL of newly created objects consists
only of the three required entries (user, group, other). The values of these entries are based on the mode
parameter in the function that creates the object and the umask currently in effect for the process.
GUI navigation
To work with this function in the GUI, log on to the IBM Spectrum Scale GUI and select Access > File
System ACL.
Use the mmputacl command to set the access ACL of a file or subdirectory, or the default ACL of a
directory. For example, to set the ACL for a file named project2.history, we can create a file named
project2.acl that contains:
user::rwxc
group::rwx-
other::--x-
mask::rwxc
user:alpha:r-xc
group:audit:rw--
group:system:rwx-
In this example,
v The first three lines are the required ACL entries setting permissions for the file's owner, the owner's
group, and for processes that are not covered by any other ACL entry.
v The last three lines contain named entries for specific users and groups.
Once you are satisfied that the correct permissions are set in the ACL file, you can apply them to the
target file with the mmputacl command. For example, to set permissions contained in the file project2.acl
for the file project2.history, enter:
mmputacl -i project2.acl project2.history
Although you can issue the mmputacl command without using the -i option to specify an ACL input file,
and make ACL entries through standard input, you will probably find the -i option more useful for
avoiding errors when creating a new ACL.
For complete usage information, see the mmputacl command and the mmgetacl command in the IBM
Spectrum Scale: Command and Programming Reference.
The first two lines are comments displayed by the mmgetacl command, showing the owner and owning
group. All entries containing permissions that are not allowed (because they are not set in the mask
entry) display with a comment showing their effective permissions.
For complete usage information, see the mmgetacl command in the IBM Spectrum Scale: Command and
Programming Reference.
For complete usage information, see the mmgetacl command and the mmputacl command in the IBM
Spectrum Scale: Command and Programming Reference.
The current ACL entries are displayed using the default editor, provided that the EDITOR environment
variable specifies a complete path name. When the file is saved, the system displays information similar
to:
mmeditacl: 6027-967 Should the modified ACL be applied? (yes) or (no)
For complete usage information, see the mmeditacl command in the IBM Spectrum Scale: Command and
Programming Reference.
You cannot delete the base permissions. These remain in effect after this command is executed.
This is because NFS V4 Linux servers handle NFS V4 ACLs by translating them into POSIX ACLs. For
more information, see “Linux ACLs and extended attributes” on page 344.
Note:
This topic applies only to kernel NFS and does not refer to the NFS Server function included with CES.
For information, see “Authorizing protocol users” on page 319.
With AIX, the file system must be configured to support NFS V4 ACLs (with the -k all or -k nfs4 option
of the mmcrfs or mmchfs command). The default for the mmcrfs command is -k all.
With Linux, the file system must be configured to support POSIX ACLs (with the -k all or -k posix
option of the mmcrfs or mmchfs command).
Depending on the value (posix | nfs4 | all) of the -k parameter, one or both ACL types can be allowed
for a given file system. Since ACLs are assigned on a per-file basis, this means that within the same file
system one file may have an NFS V4 ACL, while another has a POSIX ACL. The type of ACL can be
changed by using the mmputacl or mmeditacl command to assign a new ACL or by the mmdelacl
command (causing the permissions to revert to the mode which is in effect a POSIX ACL). At any point
in time, only a single ACL can be associated with a file. Access evaluation is done as required by the ACL
type associated with the file.
NFS V4 ACLs are represented in a completely different format than traditional ACLs. For detailed
information on NFS V4 and its ACLs, refer to NFS Version 4 Protocol and other information found in the
Network File System Version 4 (nfsv4) section of the IETF Datatracker website (datatracker.ietf.org/wg/
nfsv4/documents).
In the case of NFS V4 ACLs, there is no concept of a default ACL. Instead, there is a single ACL and the
individual ACL entries can be flagged as being inherited (either by files, directories, both, or neither).
Consequently, specifying the -d flag on the mmputacl command for an NFS V4 ACL is an error.
As in traditional ACLs, users and groups are identified by specifying the type and name. For example,
group:staff or user:bin. NFS V4 provides for a set of special names that are not associated with a specific
local UID or GID. These special names are identified with the keyword special followed by the NFS V4
name. These names are recognized by the fact that they end with the character '@'. For example,
special:owner@ refers to the owner of the file, special:group@ the owning group, and special:everyone@
applies to all users.
The next two lines provide a list of the available access permissions that may be allowed or denied, based
on the ACL type specified on the first line. A permission is selected using an 'X'. Permissions that are not
specified by the entry should be left marked with '-' (minus sign).
special:owner@:rwxc:allow:DirInherit:InheritOnly
(X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (-)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED
(X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED
user:smithj:rwxc:allow
(X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (-)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED
(X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED
In this table, the columns refer to the ACL entry for a given file, and the rows refer to the ACL entry for
its parent directory. The various combinations of these attributes produce one of these results:
Permit
Indicates that GPFS permits removal of a file with the combination of file and parent directory
ACL entries specified. (Other permission checking may exist within the operating system as well.)
Deny Indicates that GPFS denies (does not permit) removal of a file with the combination of file and
parent directory ACL entries specified.
Removal of a file includes renaming the file, moving the file from one directory to another even if the file
name remains the same, and deleting it.
Table 31. Removal of a file with ACL entries DELETE and DELETE_CHILD
ACL Allows ACL Denies DELETE not UNIX mode bits
DELETE DELETE specified only
ACL Allows DELETE_CHILD Permit Permit Permit Permit
ACL Denies DELETE_CHILD Permit Deny Deny Deny
DELETE_CHILD not specified Permit Deny Deny Deny
UNIX mode bits only - wx Permit Permit Permit Permit
permissions allowed
UNIX mode bits only - no w or no Permit Deny Deny Deny
x permissions allowed
The UNIX mode bits are used in cases where the ACL is not an NFS V4 ACL.
It can also be the case that NFS V4 ACLs have been set for some file system objects (directories and
individual files) prior to administrator action to revert back to a POSIX-only configuration. Since the NFS
V4 access evaluation will no longer be performed, it is desirable for the mmgetacl command to return an
ACL representative of the evaluation that will now occur (translating NFS V4 ACLs into traditional
POSIX style). The -k posix option returns the result of this translation.
Users may need to see ACLs in their true form as well as how they are translated for access evaluations.
There are four cases:
1. By default, the mmgetacl command returns the ACL in a format consistent with the file system
setting:
v If posix only, it is shown as a traditional ACL.
v If nfs4 only, it is shown as an NFS V4 ACL.
v If all formats are supported, the ACL is returned in its true form.
2. The command mmgetacl -k nfs4 always produces an NFS V4 ACL.
3. The command mmgetacl -k posix always produces a traditional ACL.
Chapter 22. Managing GPFS access control lists 317
4. The command mmgetacl -k native always shows the ACL in its true form, regardless of the file
system setting.
In general, users should continue to use the mmgetacl and mmeditacl commands without the -k flag,
allowing the ACL to be presented in a form appropriate for the file system setting. Since the NFS V4
ACLs are more complicated and therefore harder to construct initially, users that want to assign an NFS
V4 ACL should use the command mmeditacl -k nfs4 to start with a translation of the current ACL, and
then make any necessary modifications to the NFS V4 ACL that is returned.
The lines that follow the first one are then processed according to the rules of the expected ACL type.
special:owner@:----:deny
(-)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (-)SYNCHRONIZE (-)READ_ACL (-)READ_ATTR (X)READ_NAMED
(-)DELETE (X)DELETE_CHILD (X)CHOWN (-)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (X)WRITE_NAMED
user:guest:r-xc:allow
(X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (-)READ_ATTR (-)READ_NAMED
(X)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED
user:guest:----:deny
(-)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (-)SYNCHRONIZE (-)READ_ACL (-)READ_ATTR (X)READ_NAMED
(-)DELETE (X)DELETE_CHILD (X)CHOWN (-)EXEC/SEARCH (-)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED
This ACL shows four ACL entries (an allow and deny entry for each of owner@ and guest).
In general, constructing NFS V4 ACLs is more complicated than traditional ACLs. Users new to NFS V4
ACLs may find it useful to start with a traditional ACL and allow either mmgetacl or mmeditacl to
provide the NFS V4 translation, using the -k nfs4 flag as a starting point when creating an ACL for a
new file.
When assigning an ACL to a file that already has an NFS V4 ACL, there are some NFS rules that must be
followed. Specifically, in the case of a directory, there will not be two separate (access and default) ACLs,
as there are with traditional ACLs. NFS V4 requires a single ACL entity and allows individual ACL
entries to be flagged if they are to be inherited. Consequently, mmputacl -d is not allowed if the existing
ACL was the NFS V4 type, since this attempts to change only the default ACL. Likewise mmputacl
(without the -d flag) is not allowed because it attempts to change only the access ACL, leaving the
default unchanged. To change such an ACL, use the mmeditacl command to change the entire ACL as a
unit. Alternatively, use the mmdelacl command to remove an NFS V4 ACL, followed by the mmputacl
command.
The GPFS file system supports storing POSIX and NFSv4 ACLs to authorize file protocol users.
SMB service maps the NFSv4 ACL to a Security Descriptor for SMB clients to form the ACLs. That is, the
SMB ACL is derived from the NFSv4 ACL; it is not a separate ACL. Any changes from SMB clients on
ACLs are mapped back to the ACLs in the file system.
To get the expected behavior of ACLs, you must configure the file system to use only the NFSv4 ACLs.
The default configuration profiles (/usr/lpp/mmfs/profiles) that are included with IBM Spectrum Scale
contain the required configuration for NFSv4 ACLs in the file system. When you manually create a file
system for protocol usage with the mmcrfs command, use the -k nfs4 option to establish the correct ACL
setting. For more information, see the mmcrfs command and the mmchfs command in the IBM Spectrum
Scale: Command and Programming Reference.
With the SMB and NFS protocols, you can manage the ACL permissions on files and directories from
connected file systems. ACLs from both protocols are mapped to the same ACL in the file system. The
ACL supports inheritance and you can control the inheritance by using the special inheritance flags.
When you create the directory for an SMB or NFS export, you must ensure that the ACL on the exported
folder is set up as you want. One option is to change the owner of the export folder to an administrator,
who can then access the new export and change the ACL from an SMB or NFS client. This approach is a
good practice to follow for setting up SMB shares for use with Windows clients.
It is a good practice to manage ACLs from the ACL management interface on a connected client system.
For example, after you create the directories for an SMB or NFS export, you can set the owner of the
directory with the chown command. Then the user can connect to the export with the SMB or NFS
protocol to see and manage the ACLs that are associated with the directory over which the export is
created.
Changing the POSIX mode bits also modifies the ACL of an object. When you use ACLs for access
control, it is a good practice to ensure that ACLs are not replaced with permissions from POSIX mode
bits. You can configure this behavior with the --allow-permission-change parameter in the mmcrfileset
and mmchfileset commands.
An ACL extends the base permissions or the standard file access modes such as read, write, and execute.
ACLs are compatible with UNIX mode bits. Issuing the chmod command by the NFS clients overwrite the
access privileges that are defined in the ACL by the privileges that are derived from UNIX mode bits. By
default, the ACLs are replaced by UNIX mode bits if the chmod command is submitted. To allow proper
use of ACLs, it is a good practice to prevent chmod from overwriting the ACLs by setting this parameter
to setAclOnly or chmodAndSetAcl.
NFSv3 clients can set and read the POSIX mode bits; NFSv3 clients who set the UNIX permissions
modify the ACL to match the UNIX permissions. In most NFS-only cases, the POSIX permissions are
used directly. For NFSv3 clients, file sharing with SMB access protection is done by using NFSv4 ACLs
but NFSv3 clients can see only the mapping of ACLs to traditional UNIX access permissions. The full
NFSv4 ACLs are enforced on the server.
The permissions from the NFSv4 ACL entries special:owner@ are shown as the POSIX permission bits for
the file owner, special:group@ are shown as the POSIX permission bits for the group, and
special:everyone@ are shown for the POSIX permissions for “other”.
SMB share ACLs apply only to SMB exports and they are separate from the file system ACLs. When you
export ACLs, users need to have access in the share-level ACL and in the file or directory.
SMB share ACLs can be changed either through the MMC on a Windows client or through the mmsmb
exportacl command. For more information, see the mmsmb exportacl command on any protocol node.
Note: With automatic ID mapping (autorid), Samba assigns a UID and a GID with the same numeric
value to each SID. This causes SMB Security Descriptor to be always mapped to NFSv4 ACL group entry
as SIDs do not distinguish between a user and a group.
The mapping of entries from an SMB Security Descriptor to a NFSv4 ACL is done according to the
following table:
Table 32. Mapping from SMB Security Descriptor to NFSv4 ACL entry
Entry in SMB Security Descriptor Mapped to NFSv4 ACL entry
Principal Inheritance Principal Inheritance
EVERYONE Any special:everyone@ Same
CREATOR OWNER Subfolders or files special:owner@ Same with InheritOnly
added
The mapping from the NFSv4 ACL to an SMB Security Descriptor is done according to the following
table:
Table 33. Mapping from NFSv4 ACL entry to SMB Security Descriptor
Entry in NFSv4 ACL Mapped to SMB Security Descriptor entry
Principal Inheritance Principal Inheritance Comment
special:everyone@ Any EVERYONE Same (No comment)
special:owner@ Includes FileInherit or CREATOR OWNER Same with the An entry that applies
DirInherit exclusion of the to the current file or
current file or folder folder and marked
added for inheritance can
result in mapping one
Applies to current Mapped user Same
NFS4 entry in two
folder (not
Security Descriptor
InheritOnly)
entries.
Independent of this mapping, Microsoft Windows Clients expect the ACL to be stored in a canonical
order. To avoid problems with presenting differently ordered ACLs to these clients, manage the ACLs for
SMB clients that run Microsoft Windows from a Microsoft Windows system.
ACL inheritance
The inheritance flags in ACL entry of parent directories are used to control the inheritance of
authorization to the child files and directories. The inheritance flag gives you the granularity to specify
whether the inheritance defined in an ACL entry applies to the current directory and its children or only
to the subdirectories and files that are contained in the parent directory. ACL entries are inherited to the
child directories or files at the time of creation. Changes made to the ACL of a parent directory are not
propagated to child directories or files. However, in case of SMB, you can specify to propagate the
inheritance changes from a parent to all its child by using File Explorer, command line, or PowerShell.
The NFSV4 protocol uses the following flags to specify and control inheritance information of the ACEs:
v FileInherit: Indicates that this ACE must be added to each new non-directory file created. This flag is
signified by ‘f’ or file_inherit.
v DirInherit: Indicates that this ACE must be added to each new directory created. This flag is signified
by ‘d’ or dir_inherit.
v InheritOnly: Indicates that this ACE is not applied to the parent directory itself, but only inherited by
its children. This flag is signified by ‘i’ or inherit_only.
v NoPropagateInherit: Indicates that the ACL entry must be included in the initial ACL for subdirectories
that are created in this directory but not further propagated to subdirectories created below that level.
In case of SMB, the following list shows how the inheritance flags are linked to the Microsoft Windows
inheritance modes:
v This folder only (No bits)
v This folder, subfolder, and files (FileInherit, DirInherit)
v This folder and subfolders (DirInherit)
v This folder and files (FileInherit)
v Subfolders and files only (FileInherit, DirInherit, InheritOnly)
v Subfolders only (DirInherit, InheritOnly)
v Files only (FileInherit, InheritOnly)
The recommended way to manage access is per group instead of per individual user. This way, users can
be easily added to or removed from the group. Providing ACLs to groups has an added advantage of
managing inheritance easily for the whole group of users simultaneously. If individual users are added
directly to ACLs and you need to make a change, you need to update ACLs of all corresponding
directories and files. On the authentication server like Active Directory or LDAP, you can create groups
and add users as members and use these groups to give respective access to data.
If you need to set ACLs for individual users where data is created by users in folders that are created by
others, it is recommended that you explicitly add the users who need ACLs on that export.
In mixed mode, where the share is used for both NFS and SMB access, parent owner might experience
loss of access to the child directory or the files. To avoid such a problem, it is recommended that you
provide ACLs explicitly to each user.
The special owner and group dynamically refer to the owner and group of the directory or file that the
ACL belongs to. For example, if the owner of a file is changed, all special:owner@ entries in the ACL
refers to the new owner. In case of inheritance, this leads to some complexity because those special
entries point to the owner and group of the child directory or file that inherits the entry. In many cases,
the children do not have the same owner and group as the parent directory. Therefore, the special entries
in parent and children refer to different users. This can be avoided by adding static entries (user:'name' or
group:'name') to the ACL. These static entries are inherited by name and refer everywhere to the same
users. But they are not updated if the owner of the parent is changed. The general recommendation is not
to use special:owner@ and special:group@ together with inheritance flags. For more information, see the
mmputacl command.
The inheritance of ACL from the owner of a directory to subdirectories and files works only for
subdirectories and files that have the same owner as the parent directory. A subdirectory or file that is
created by a different owner does not inherit the ACL of a parent directory that is owned by another
user.
In case of special access to NFSV4 exports, parent owners might experience loss of access to its child
folders and files. To avoid such a problem, for mixed mode, it is recommended that you provide ACLs to
groups rather than to individual users.
Table 35. ACL permissions required to work on files and directories, while using SMB protocol (table 2 of 2)
ACL Operation ACL Permission
Write Delete
Write extended subfolder Read Write Take
attribute attributes and files Delete permissions permissions ownership
Execute file
List folder
Read data from
file
Read attributes
Create file
Create folder
Write data to file X X
Write file X
attributes
Table 36. ACL permissions required to work on files and directories, while using NFS protocol (table 1 of 2)
ACL Operation ACL Permission
Traverse Read Create
folder / List folder / extended Create files / folders /
execute file read data Read attribute attribute write data append data
Execute file P, X X
List folder P X
Read data from file P X
Read attributes P
Create file P P
Create folder P P
Write data to file P X X
Write file attributes P
Write folder attributes P
Delete file P P
Delete folder P P
Rename file P X P
Rename folder P X P P
Read file ACL P
Read folder ACL P
Write file ACL P
Table 37. ACL permissions required to work on files and directories, while using NFS protocol (table 2 of 2)
ACL Operation ACL Permission
Write Delete
Write extended subfolder Take
attribute attributes and files Delete Read ACL Write ACL ownership
Execute file
List folder
Read data from file
Read attributes
Create file
Create folder
Write data to file
Write file attributes
Write folder
attributes
Delete file P
Delete folder P
Rename file P
Rename folder P
Read file ACL
Read folder ACL
Write file ACL X
Write folder ACL X
Take file ownership X
Take folder X
ownership
The following are the considerations on the ACL read and write permissions:
1. The files that require "Traverse folder / execute file" permission do not require the "Bypass Traverse
Check" attribute to be enabled. This attribute is enabled by default on the files.
2. The "Read extended attribute" permission is required by the SMB clients with recent Microsoft
Windows versions (for Microsoft Windows 2008, Microsoft Windows 2012, and Microsoft Windows 8
versions) for file copy operations. The default ACLs set without inheritance do not contain this
permission. It is recommended that you use inherited permissions where possible and enable this
permission in the inherited permissions to prevent the default value to be used and cause problems.
The file system must be created with native ACL type as NFS V4. It is recommended to use the default
configuration profiles (/usr/lpp/mmfs/profiles) that are included with IBM Spectrum Scale. It contains
the required configuration for NFSV4 ACLs in the file system.
Perform the following steps to apply default ACLs on SMB and NFS exports:
1. Create a fileset or directory in the file system as shown in the following example:
mkdir -p /ibm/gpfs0/testsmbexport
2. Change the owner and group of the fileset or directory using chown and chgrp respectively. For
example:
chown -R "DOMAIN\\username":"DOMAIN\\groupname" /ibm/gpfs0/testsmbexport
3. Use the mmputacl or mmeditacl commands to set the wanted ACE along with specific ACE for owner
user and owner group and inheritance flags for the fileset or directory.
4. Check the ACL setting for the fileset or directory by using the mmgetacl command.
5. Create the desired SMB or NFS export by using the mmnfs or mmsmb commands over the fileset or
directory.
Perform the following steps to create an SMB share and view the owner of the export:
1. Submit the mmsmb export add command to create an SMB share as shown in the following example:
mmsmb export add testsmbexport /ibm/gpfs0/testsmbexport
2. Issue either the ls -la command or the mmgetacl command to view the owner of the export. For
example:
ls -la /ibm/gpfs0/testsmbexport
Or
mmgetacl /ibm/gpfs0/testsmbexport
Apart from the tasks that are listed earlier in this section, the following table provides a quick overview
of the tasks that can be performed to manage ACLs and the corresponding IBM Spectrum Scale
command.
Table 38. Commands and reference to manage ACL tasks.
Tasks that can be performed to
manage ACLs Command Reference topic
Applying ACL at file system, fileset, mmeditacl “Applying an existing NFS V4 access
and export level control list” on page 318
Inserting ACEs in existing ACLs mmeditacl “Changing NFS V4 access control
lists” on page 318
Modifying ACLs mmeditacl “Changing NFS V4 access control
lists” on page 318
Copying Access control list entries mmeditacl “Changing NFS V4 access control
lists” on page 318
Replacing a complete ACL mmputacl or mmeditacl “Changing NFS V4 access control
lists” on page 318
Replacing all entries for a specific mmeditacl “Changing NFS V4 access control
user inside an ACL lists” on page 318
Controlling inheritance of entries mmputacl or mmeditacl
inside an ACL
Deleting complete ACL mmdelacl “Deleting NFS V4 access control
lists” on page 319
Deleting specific ACL entries mmeditacl “Changing NFS V4 access control
lists” on page 318
Deleting ACL entry for a user mmeditacl “Changing NFS V4 access control
lists” on page 318
Displaying an ACL mmgetacl “Displaying NFS V4 access control
lists” on page 318
Changing file system directory’s chown or chgroup
owner and group
Displaying file system directory’s ls –l or mmgetacl
owner and group
For SMB shares, it is recommended to manage the ACLs from a Windows client. The following
operations are included in creating an SMB share:
1. Create the folder to export in the file system with the mkdir command.
2. Change the owner of the exported folder to a user who configures the initial ACLs.
3. Create the export using the mmsmb export add command.
4. Using a Windows client machine, access the newly created share as the user specified in step 2.
5. Right-click on the shared folder, and select Properties.
6. Select the Security tab and then select Advanced to navigate to the more detailed view of
permissions.
7. Add and remove permissions as required.
Access for the object users to the Object Storage projects are controlled by the user roles and container
ACLs. Based on the roles defined for the user, object users can be administrative users and
non-administrative users. Non-admin users can only perform operations per container based on the
container’s X-Container-Read and X-Container-Write ACLs. Container ACLs can be defined to limit access
to objects in swift containers. Read access can be limited to only allow download, or allow download and
listing. Write access allows the user to upload new objects to a container.
You can use an external AD or LDAP server or a local database as the back-end to store and manage user
credentials for user authentication. The authorization details such as relation of users with projects and
roles are maintained locally by the keystone server. The customer can select the authentication server to
be used. For example, if AD is existing in an enterprise deployment and the users in AD are required to
access object data, the customer can decide to use AD as the back-end authentication server.
When the back-end authentication server is AD or LDAP, the user management operations such as
creating a user and deleting a user are the responsibility of the AD or LDAP administrator, who can
optionally also be the Keystone server administrator. When local authentication is used for object access,
the user management operations are done by the Keystone administrator. In case of authorization, the
management tasks such as creating roles, projects, and associating the user with them is done by the
Keystone Administrator. The Keystone administration can be done through the Keystone V3 REST API or
by using an OpenStack python-based client.
Before you start creating object users, and projects, ensure that Keystone server is configured and the
authentication servers are set up properly. You can use the mmces service list -a -v command to see
whether Keystone is configured properly.
The object users are authorized to the object data and resources by creating and managing roles and
ACLs. The roles and ACLs define the actions that can be performed by the user on the object resources
such as accessing data, managing the projects, creating projects, read, write, and run permissions.
Creating containers:
The Object Storage organizes data in account, container, and object. Each account and container is an
individual database that is distributed across the cluster. An account database contains the list of
containers in that account. A container database contains the list of objects in that container.
It is the responsibility of the Keystone server administrator to create and manage accounts. The account
defines a namespace for containers. A container must be unique within the owning account and account
must use a unique name within the project. The admin account is created by default.
GUI navigation
To work with this function in the IBM Spectrum Scale GUI, log on to the GUI and select Object >
Containers.
1. Issue the swift post container command to create a container by using the Swift Command Line
Client. In the following example, the Keystone administrator creates a public_readOnly container in
admin account.
# swift post public_readOnly --os-auth-url http://tully-ces-ip.adcons.spectrum:35357/v3
--os-project-name admin --os-project-domain-name Default --os-username admin
--os-user-domain-name Default --os-password Passw0rd --auth-version 3
2. Issue the swift list command to list the containers that are available for the account. In the
following example, the system lists the containers that are available in the admin project.
# swift list --os-auth-url http://tully-ces-ip.adcons.spectrum:35357/v3
--os-project-name admin --os-project-domain-name Default --os-username admin
--os-user-domain-name Default --os-password Passw0rd --auth-version 3
public_readOnly
3. Issue the swift stat command to list the accounts, containers, or objects details. In the following
example, the system displays the admin account details.
# swift stat -v --os-auth-url http://tully-ces-ip.adcons.spectrum:35357/v3
--os-project-name admin --os-project-domain-name Default --os-username admin
--os-user-domain-name Default --os-password Passw0rd --auth-version 3
StorageURL: http://tully-ces-ip.adcons.spectrum:8080/v1
/AUTH_bea5a0c632e54eaf85e9150a16c443ce
Auth Token: 1f6260c4f8994581a465b8225075c932
Account: AUTH_bea5a0c632e54eaf85e9150a16c443ce
Containers: 1
Objects: 0
Bytes: 0
Containers in policy "policy-0": 1
Objects in policy "policy-0": 0
Bytes in policy "policy-0": 0
X-Account-Project-Domain-Id: default
X-Timestamp: 1432766053.43581
X-Trans-Id: tx9b96c4a8622c40b3ac69a-0055677ce7
Content-Type: text/plain; charset=utf-8
Accept-Ranges: bytes
In the following example, the system displays the public_readOnly' container details, on the admin
account:
# swift stat public_readOnly -v --os-auth-url http://tully-ces-ip.adcons.spectrum:35357/v3
--os-project-name admin --os-project-domain-name Default --os-username admin
--os-user-domain-name Default --os-password Passw0rd --auth-version 3
URL: http://tully-ces-ip.adcons.spectrum:8080/v1/AUTH_bea5a0c632e54eaf85e9150a16c443ce
/public_readOnly
To list operator_roles in all other cases, issue the mmobj config list with the following parameters:
mmobj config list --ccrfile proxy-server.conf --section filter:keystone --property operator_roles
Keystone administrator can also use the container to control access to the objects by using an access
control list (ACL). The following example shows that when a member of the admin account tries to
display the details of public_readOnly account, the process fails because it does not have an operator
role or access control defined:
# swift stat public_readOnly -v --os-auth-url http://tully-ces-ip.adcons.spectrum:35357/v3
--os-project-name admin --os-project-domain-name Default --os-username member
--os-user-domain-name Default --os-password Passw0rd --auth-version 3
Container HEAD failed: http://tully-ces-ip.adcons.spectrum:8080/v1
/AUTH_bea5a0c632e54eaf85e9150a16c443ce/public_readOnly 403 Forbidden
Related tasks:
“Creating read ACLs to authorize object users”
The Keystone administrator can create container ACLs to grant read permissions using X-Container-Read
headers in curl tool or –read-acl flag in the Swift Command Line Client.
“Creating write ACLs to authorize object users” on page 333
The Keystone administrator can create container ACLs to grant write permissions using
X-Container-Write headers in the curl tool or –write-acl flag in the Swift Command Line Client.
The Keystone administrator can create container ACLs to grant read permissions using X-Container-Read
headers in curl tool or –read-acl flag in the Swift Command Line Client.
Note: The .r:* ACL specifies access for any referrer regardless of account affiliation or user name.
The .rlistings ACL allows to list the containers and read (download) objects.
Forbidden
Note: Use comma (,) to separate ACLs. For example, –read-acl admin:admin,students:student1.
Related tasks:
“Creating containers” on page 330
The Object Storage organizes data in account, container, and object. Each account and container is an
individual database that is distributed across the cluster. An account database contains the list of
containers in that account. A container database contains the list of objects in that container.
“Creating write ACLs to authorize object users”
The Keystone administrator can create container ACLs to grant write permissions using
X-Container-Write headers in the curl tool or –write-acl flag in the Swift Command Line Client.
The Keystone administrator can create container ACLs to grant write permissions using
X-Container-Write headers in the curl tool or –write-acl flag in the Swift Command Line Client.
# curl -i http://tully-ces-ip.adcons.spectrum:8080/v1/AUTH_bea5a0c632e54eaf85e9150a16c443ce
/writeOnly -X PUT -H "Content-Length: 0" -H "X-Auth-Token: ${token}" -H
"X-Container-Write: admin:member,students:student1" -H "X-Container-Read: "
HTTP/1.1 201 Created
Content-Length: 0
Content-Type: text/html; charset=UTF-8
X-Trans-Id: txf7b0bfef877345949c61c-005567b9d1
Date: Fri, 29 May 2015 00:58:57 GMT
# curl -i http://tully-ces-ip.adcons.spectrum:8080/v1/AUTH_bea5a0c632e54eaf85e9150a16c443ce
/writeOnly/imageA.JPG -X PUT -H "X-Auth-Token: ${token}" --upload-file imageA.JPG
HTTP/1.1 100 Continue
HTTP/1.1 201 Created
Last-Modified: Fri, 29 May 2015 01:11:28 GMT
Content-Length: 0
Etag: 95d8c44b757f5b0c111750694dffef2b
Content-Type: text/html; charset=UTF-8
X-Trans-Id: tx6caa0570bfcd419782274-005567bcbe
Date: Fri, 29 May 2015 01:11:28 GMT
3. List the state of the writeOnly container as student1 user of the students project. This operation fails as
the user does not have the required privileges.
# curl -i http://tully-ces-ip.adcons.spectrum:8080/v1/AUTH_bea5a0c632e54eaf85e9150a16c443ce
/writeOnly/imageA.JPG -X HEAD -H "X-Auth-Token: ${token}"
HTTP/1.1 403 Forbidden
Content-Type: text/html; charset=UTF-8
X-Trans-Id: tx4f7dfbfd74204785b6b50-005567bd8c
Content-Length: 0
Date: Fri, 29 May 2015 01:14:52 GMT
4. Grant read permissions to student1 user of the students project:
token=$(openstack --os-auth-url http://tully-ces-ip.adcons.spectrum:35357/v3
--os-project-name admin --os-project-domain-name Default --os-username admin
--os-user-domain-name Default --os-password Passw0rd --os-identity-api-version 3
token issue | grep -w "id" | awk ’{print $4}’)
# curl -i http://tully-ces-ip.adcons.spectrum:8080/v1/AUTH_
bea5a0c632e54eaf85e9150a16c443ce
/writeOnly -X POST -H "Content-Length: 0" -H "X-Auth-Token:
${token}" -H "X-Container-Read: students:student1"
HTTP/1.1 204 No Content
Content-Length: 0
Content-Type: text/html; charset=UTF-8
X-Trans-Id: tx77aafe0184da4b68a7756-005567beac
Date: Fri, 29 May 2015 01:19:40 GMT
5. Verify whether the sutdent1 user has the read access now.
token=$(openstack --os-auth-url http://tully-ces-ip.adcons.spectrum:35357/v3
--os-project-name students --os-project-domain-name Default --os-username student1
--os-user-domain-name Default --os-password Passw0rd --os-identity-api-version 3
token issue | grep -w "id" | awk ’{print $4}’)
# curl -i http://tully-ces-ip.adcons.spectrum:8080/v1/AUTH_bea5a0c632e54eaf85e9150a16c443ce
/writeOnly -X GET -H "X-Auth-Token: ${token}"
HTTP/1.1 200 OK
Content-Length: 11
X-Container-Object-Count: 1
Accept-Ranges: bytes
X-Storage-Policy: Policy-0
X-Container-Bytes-Used: 5552466
X-Timestamp: 1432861137.91693
Content-Type: text/plain; charset=utf-8
X-Trans-Id: tx246b39018a5c4bcb90c7f-005567bff3
Date: Fri, 29 May 2015 01:25:07 GMT
imageA.JPG
Authorization limitations
Authorization limitations are specific to the protocols that are used to access data.
For more information on the known limitations of the NFSV4 ACLs, see “Exceptions and limitations to
NFS V4 ACLs support” on page 344.
Note: None of these sections take into account the NFS server integration that is introduced with CES.
The integrated NFS server interactions, with the following documented sections, will be addressed in a
future release.
Note:
v Ensure that all GPFS file systems that you use to export data via NFS are mounted with the
syncnfs option to prevent clients from running into data integrity issues during failover. Prior to
mounting a GPFS system, it is a good practice to run the mmchfs command to set the syncnfs
option, -o syncnfs.
v Ensure that NFS clients mount with the -o hard option to prevent any application failures during
network failures or node failovers.
v If caching is turned on on the NFS clients, files that are migrated to the cloud storage tier by using
Transparent cloud tiering remain in the co-resident status, and the capacity is not freed from the file
system. However, if caching is disabled, the files are moved to the non-resident status and the
capacity is freed. In this case, there is a negative impact on the performance. Therefore, there is a
tradeoff between capacity and performance, and administrators must take a judicious decision
depending on the business requirements.
2. Make sure that the clocks of all nodes in the GPFS cluster are synchronized. If this is not done, NFS
access to the data, as well as other GPFS file system operations, may be disrupted.
Export considerations
Keep these points in mind when exporting a GPFS file system to NFS. The operating system being used
and the version of NFS might require special handling or consideration.
For Linux nodes only, issue the exportfs -ra command to initiate a reread of the /etc/exports file.
Starting with Linux kernel version 2.6, an fsid value must be specified for each GPFS file system that is
exported on NFS. For example, the format of the entry in /etc/exports for the GPFS directory /gpfs/dir1
might look like this:
/gpfs/dir1 cluster1(rw,fsid=745)
The administrator must assign fsid values subject to the following conditions:
1. The values must be unique for each file system.
2. The values must not change after reboots. The file system should be unexported before any change is
made to an already assigned fsid.
3. Entries in the /etc/exports file are not necessarily file system roots. You can export multiple directories
within a file system. In the case of different directories of the same file system, the fsids must be
different. For example, in the GPFS file system /gpfs, if two directories are exported (dir1 and dir2),
the entries might look like this:
/gpfs/dir1 cluster1(rw,fsid=745)
/gpfs/dir2 cluster1(rw,fsid=746)
4. If a GPFS file system is exported from multiple nodes, the fsids should be the same on all nodes.
Configuring the directories for export with NFSv4 differs slightly from the previous NFS versions. To
configure the directories, do the following:
1. Define the root of the overall exported file system (also referred to as the pseudo root file system) and
the pseudo file system tree. For example, to define /export as the pseudo root and export /gpfs/dir1
and /gpfs/dir2 which are not below /export, run:
mkdir –m 777 /export /export/dir1 /export/dir2
mount --bind /gpfs/dir1 /export/dir1
mount –-bind /gpfs/dir2 /export/dir2
In this example, /gpfs/dir1 and /gpfs/dir2 are bound to a new name under the pseudo root using the
bind option of the mount command. These bind mount points should be explicitly unmounted after
GPFS is stopped and bind-mounted again after GPFS is started. To unmount, use the umount
command. In the preceding example, run:
umount /export/dir1; umount /export/dir2
The two exported directories (with their newly bound paths) are entered into the /etc/exports file.
Large installations with hundreds of compute nodes and a few login nodes or NFS-exporting nodes
require tuning of the GPFS parameters maxFilesToCache and maxStatCache with the mmchconfig
command.
This tuning is required for the GPFS token manager (file locking), which can handle approximately
1,000,000 files in memory. The token manager keeps track of a total number of tokens, which equals 5000
* number of nodes. This will exceed the memory limit of the token manager on large configurations. By
default, each node holds 5000 tokens.
| For information about the default values of maxFilesToCache and maxStatCache, see the description of the
| maxStatCache attribute in the topic mmchconfig command in the IBM Spectrum Scale: Command and
| Programming Reference.
| In versions of IBM Spectrum Scale earlier than 5.0.2, the stat cache is not effective on the Linux platform
| unless the Local Read-Only Cache (LROC) is configured. For more information, see the description of the
| maxStatCache parameter in the topic mmchconfig command in the IBM Spectrum Scale: Command and
| Programming Reference.
If you are running at SLES 9 SP 1, the kernel defines the sysctl variable fs.nfs.use_underlying_lock_ops,
which determines whether the NFS lockd is to consult the file system when granting advisory byte-range
locks. For distributed file systems like GPFS, this must be set to true (the default is false).
For additional considerations when NFS exporting your GPFS file system, refer to File system creation
considerations topic in the IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
To export a GPFS file system using NFS V4, there are two file system settings that must be in effect.
These attributes can be queried using the mmlsfs command, and set using the mmcrfs and mmchfs
commands.
1. The -D nfs4 flag is required. Conventional NFS access would not be blocked by concurrent file system
reads or writes (this is the POSIX semantic). NFS V4 however, not only allows for its requests to block
if conflicting activity is happening, it insists on it. Since this is an NFS V4 specific requirement, it
must be set before exporting a file system.
flag value description
---- -------------- -----------------------------------------------------
-D nfs4 File locking semantics in effect
2. The -k nfs4 or -k all flag is required. Initially, a file system has the -k posix setting, and only
traditional GPFS ACLs are allowed. To export a file system using NFS V4, NFS V4 ACLs must be
enabled. Since NFS V4 ACLs are vastly different and affect several characteristics of the file system
objects (directories and individual files), they must be explicitly enabled. This is done either
exclusively, by specifying -k nfs4, or by allowing all ACL types to be stored.
flag value description
---- -------------- -----------------------------------------------------
-k all ACL semantics in effect
Note: In IBM Spectrum Scale 4.2 and later, NFS connections are limited to a maximum of 2250 for a large
number of NFS exports. The maximum number of NFS exports supported is 1000.
You may also choose to restrict the use of the NFS server node through the normal GPFS path and not
use it as either a file system manager node or an NSD server.
A GPFS soft-mount does not automatically unmount. Setting -fstype nfs3 causes the local server mounts
to always go through NFS. This allows you to have the same auto.map file on all nodes whether the
server is local or not, and the automatic unmount will occur. If you want local soft-mounts of GPFS file
systems while other nodes perform NFS mounts, you should have different auto.map files on the
different classes of nodes. This should improve performance on the GPFS nodes as they will not have to
go through NFS.
The participating nodes are designated as Cluster NFS (CNFS) member nodes and the entire setup is
frequently referred to as CNFS or CNFS cluster.
In this solution, all CNFS nodes export the same file systems to the NFS clients. When one of the CNFS
nodes fails, the NFS serving load moves from the failing node to another node in the CNFS cluster.
Failover is done using recovery groups to help choose the preferred node for takeover.
Applications that depend on exact reporting of changes to the following fields returned by the stat() call
may not work as expected:
1. exact mtime
2. mtime
3. ctime
4. atime
Providing exact support for these fields would require significant performance degradation to all
applications executing on the system. These fields are guaranteed accurate when the file is closed.
These values will be accurate on a node right after it accesses or modifies a file, but may not be accurate
for a short while when a file is accessed or modified on some other node.
If 'exact mtime' is specified for a file system (using the mmcrfs or mmchfs commands with the -E yes
flag), the mtime and ctime values are always correct by the time the stat() call gives its answer. If 'exact
mtime' is not specified, these values will be accurate after a couple of minutes, to allow the
synchronization daemons to propagate the values to all nodes. Regardless of whether 'exact mtime' is
specified, the atime value will be accurate after a couple of minutes, to allow for all the synchronization
daemons to propagate changes.
Alternatively, you may use the GPFS calls, gpfs_stat() and gpfs_fstat() to return exact mtime and atime
values.
The delayed update of the information returned by the stat() call also impacts system commands which
display disk usage, such as du or df. The data reported by such commands may not reflect changes that
have occurred since the last sync of the file system. For a parallel file system, a sync does not occur until
all nodes have individually synchronized their data. On a system with no activity, the correct values will
be displayed after the sync daemon has run on all nodes.
Note: Even if IBM Spectrum Scale ignores these bits, the SMB service enforces them on the protocol
level.
3. AIX requires that READ_ACL and WRITE_ACL always be granted to the object owner. Although this
contradicts NFS Version 4 protocol, it is viewed that this is an area where users would otherwise
erroneously leave an ACL that only privileged users could change. Because ACLs are themselves file
attributes, READ_ATTR and WRITE_ATTR are similarly granted to the owner. Because it would not make
sense to then prevent the owner from accessing the ACL from a non-AIX node, IBM Spectrum Scale
has implemented this exception everywhere.
4. AIX does not support the use of special name values other than owner@, group@, and everyone@.
Therefore, these are the only valid special name values for use in IBM Spectrum Scale NFS V4 ACLs.
5. NFS V4 allows ACL entries that grant permission to users or groups to change the ownership of a file
with a command such as the chown command. For security reasons, IBM Spectrum Scale now restricts
these permissions so that a nonprivileged user can chown such a file only to himself or to a group that
he or she is a member of.
6. With some limitations, Windows clients that access IBM Spectrum Scale via Samba can use their
native NTFS ACLs, which are mapped to the underlying NFS v4 ACLs. For limitations, see
“Authorization limitations” on page 335.
7. Ganesha supports NFS v4 ACLs to and from IBM Spectrum Scale. However, to export a file system
with cNFS/KNFS, you must configure the file system to support POSIX ACLs. Use the mmcrfs
command with the -k all or -k posix parameter. With Samba, use the -k nfs4 parameter. NFS V4
Linux servers handle ACLs properly only if they are stored in GPFS as POSIX ACLs. For more
information, see “Linux ACLs and extended attributes.”
8. The cluster can include Samba, CES NFS, AIX NFS, and IBM Spectrum Scale Windows nodes.
9. NFS V4 ACLs can be stored in GPFS file systems using Samba exports, NFS V4 AIX servers, GPFS
Windows nodes, aclput, and mmputacl. Clients of Linux V4 servers cannot see stored ACLs but can
see the permissions from the mode.
For more information about ACLs and NFS export, see Chapter 22, “Managing GPFS access control lists,”
on page 311.
Although the NFS V4 protocol defines a richer ACL model similar to Windows ACLs, the Linux
implementation maps those ACLs to POSIX ACLs before passing them to the underlying file system. This
NFS V4 ACLs are more fine-grained than POSIX ACLs, so the POSIX-to-NFS V4 translation is close to
perfect, but the NFS V4-to-POSIX translation is not. The NFS V4 server attempts to err on the side of
mapping to a stricter ACL. There is a very small set of NFS V4 ACLs that the server rejects completely
(for example, any ACL that attempts to explicitly DENY rights to read attributes), but otherwise, the
server tries very hard to accept ACLs and map them as best it can.
ACLs that are set through AIX/NFS V4 and Windows nodes are written as NFS V4 ACLs to GPFS,
whereas ACLs that are set through Linux/NFS V4 are written as POSIX ACLs to GPFS. Currently, GPFS
does not provide an interface to convert on-disk NFS V4 ACLs to POSIX ACLs. This means that if ACLs
are written through either AIX/NFS V4 or Windows, they cannot be read by Linux/NFS V4. In this case,
a Linux NFS V4 server constructs an ACL from the permission mode bits only and ignores the ACL on
the file.
When mounting an NFSv3 file system on a protocol node, the Linux kernel lockd daemon registers with
the rpcbind, preventing the CES NFS lock service from taking effect. If you need to mount an NFSv3 file
system on a CES NFS protocol node, use the -o nolock mount option to prevent invoking the Linux
kernel lockd daemon.
Using direct I/O may provide some performance benefits in the following cases:
v The file is accessed at random locations.
v There is no access locality.
Direct transfer between the user buffer and the disk can only happen if all of the following conditions are
true:
Chapter 24. Considerations for GPFS applications 345
v The number of bytes transferred is a multiple of 512 bytes.
v The file offset is a multiple of 512 bytes.
v The user memory buffer address is aligned on a 512-byte boundary.
When these conditions are not all true, the operation will still proceed but will be treated more like other
normal file I/O, with the O_SYNC flag that flushes the dirty buffer to disk.
When these conditions are all true, the GPFS page pool is not used because the data is transferred
directly; therefore, an environment in which most of the I/O volume is due to direct I/O (such as in
databases) will not benefit from a large page pool. Note, however, that the page pool still needs to be
configured with an adequate size, or left at its default value, because the page pool is also used to store
file metadata (especially for the indirect blocks required for large files).
With direct I/O, the application is responsible for coordinating access to the file, and neither the overhead
nor the protection provided by GPFS locking mechanisms plays a role. In particular, if two threads or
nodes perform direct I/O concurrently on overlapping portions of the file, the outcome is undefined. For
example, when multiple writes are made to the same file offsets, it is undetermined which of the writes
will be on the file when all I/O is completed. In addition, if the file has data replication, it is not
guaranteed that all the data replicas will contain the data from the same writer. That is, the contents of
each of the replicas could diverge.
Even when the I/O requests are aligned as previously listed, in the following cases GPFS will not transfer
the data directly and will revert to the slower buffered behavior:
v The write causes the file to increase in size.
v The write is in a region of the file that has been preallocated (via gpfs_prealloc()) but has not yet
been written.
v The write is in a region of the file where a “hole” is present; that is, the file is sparse and has some
unallocated regions.
When direct I/O requests are aligned but none of the previously listed conditions (that would cause the
buffered I/O path to be taken) are present, handling is optimized this way: the request is completely
handled in kernel mode by the GPFS kernel module, without the GPFS daemon getting involved. Any of
the following conditions, however, will still result in the request going through the daemon:
v The I/O operation needs to be served by an NSD server.
v The file system has data replication. In the case of a write operation, the GPFS daemon is involved to
produce the log records that ensure that the replica contents are identical (in case of a failure while
writing the replicas to disk).
v The operation is performed on the Windows operating system.
Note that setting the O_DIRECT flag on an open file with fcntl (fd, F_SETFL,[..]), which may be
allowed on Linux, is ignored in a GPFS file system.
Because of a limitation in Linux, I/O operations with O_DIRECT should not be issued concurrently with a
fork(2) system call that is invoked by the same process. Any calls to fork() in the program should be
issued only after O_DIRECT I/O operations are completed. That is, fork() should not be invoked while
O_DIRECT I/O operations are still pending completion. For more information, see the open(2) system call
in the Linux documentation.
The ability to access and mount GPFS file systems owned by other clusters in a network of sufficient
bandwidth is accomplished using the mmauth, mmremotecluster and mmremotefs commands. Each site
in the network is managed as a separate cluster, while allowing shared file system access.
The cluster owning the file system is responsible for administering the file system and granting access to
other clusters on a per cluster basis. After access to a particular file system has been granted to nodes in
another GPFS cluster, the nodes can mount the file system and perform data operations as if the file
system were locally owned.
Each node in the GPFS cluster requiring access to another cluster's file system must be able to open a
TCP/IP connection to every node in the other cluster.
Nodes in two separate remote clusters mounting the same file system are not required to be able to open
a TCP/IP connection to each other. For example, if a node in clusterA mounts a file system from
clusterB, and a node in clusterC desires to mount the same file system, nodes in clusterA and clusterC
do not have to communicate with each other.
Each node in the GPFS cluster requiring file system access must have one of the following:
v A virtual connection to the file system data through an NSD server (refer to Figure 8 on page 348).
v A physical connection to the disks containing file system data (refer to Figure 9 on page 348).
In this example, network connectivity is required from the nodes in clusterB to all the nodes in clusterA
even if the nodes in clusterB can access the disks in clusterA directly.
Note: Even when remote nodes have direct connectivity to the SAN, they will still use a connection
through an NSD server for any NSDs that have been configured with Persistent Reserve (PR). If you
want the remote nodes to access the disks through their direct connection to the SAN, you must ensure
that PR is not enabled on the NSDs. See “Enabling and disabling Persistent Reserve” on page 174.
cluster A cluster B
wide area
network
Figure 10 on page 349 illustrates a multi-cluster configuration with multiple NSD servers. In this
configuration:
v The two nodes in Cluster 1 are defined as the NSD servers (you can have up to eight NSD server
nodes).
v All three clusters are connected with Gigabit Ethernet.
v Cluster 1 shares an InfiniBand switch network with Cluster 2 and an InfiniBand switch network with
Cluster 3.
In order to take advantage of the fast networks and to use the nodes in Cluster 1 as NSD servers for
Cluster 2 and Cluster 3, you must configure a subnet for each of the supported clusters. For example
issuing the command:
v mmchconfig subnets="<IB_Network_1> <IB_Network_1>/Cluster1" in Cluster 2 allows nodes N2
through Nx to use N1 as an NSD server with InfiniBand Network 1 providing the path to the data.
v mmchconfig subnets="<IB_Network_2> <IB_Network_2>/Cluster1" in Cluster 3 allows nodes N2+x
through Ny+x to use N1+x as an NSD server with InfiniBand Network 2 providing the path to the data.
N1 N2 N3 Nx
NSD
Cluster 3
InfiniBand switch network 2
IB21 IB22 IB23 IB2x
When you implement file access from other clusters, consider these topics:
v “Remote user access to a GPFS file system”
v “Mounting a remote GPFS file system” on page 354
v “Managing remote access to a GPFS file system” on page 356
v “Using remote access with multiple network definitions” on page 356
v “Using multiple security levels for remote access” on page 358
v “Changing security keys with remote access” on page 359
v “Important information about remote access” on page 361
For consistency of ownership and access control, a uniform user identity namespace is preferred. For
example, if user Jane Doe has an account on nodeA with the user name janedoe and user ID 1001 and
group ID 500, on all other nodes in the same cluster Jane Doe will have an account with the same user
and group IDs. GPFS relies on this behavior to perform file ownership and access control tasks.
If a GPFS file system is being accessed from a node belonging to another GPFS cluster, the assumption
about the uniform user account infrastructure might no longer be valid. Since different clusters can be
administered by different organizations, it is possible for each of the clusters to have a unique set of user
accounts. This presents the problem of how to permit users to access files in a file system owned and
served by another GPFS cluster. In order to have such access, the user must be somehow known to the
other cluster. This is usually accomplished by creating a user account in the other cluster, and giving this
account the same set of user and group IDs that the account has in the cluster where the file system was
created.
To continue with this example, Jane Doe would need an account with user ID 1001 and group ID 500
created in every other GPFS cluster from which remote GPFS file system access is desired. This approach
is commonly used for access control in other network file systems, (for example, NFS or AFS®), but might
pose problems in some situations.
Access from a remote cluster by a root user presents a special case. It is often desirable to disallow root
access from a remote cluster while allowing regular user access. Such a restriction is commonly known as
root squash. A root squash option is available when making a file system available for mounting by other
clusters using the mmauth command. This option is similar to the NFS root squash option. When
enabled, it causes GPFS to squash superuser authority on accesses to the affected file system on nodes in
remote clusters.
This is accomplished by remapping the credentials: user id (UID) and group id (GID) of the root user, to
a UID and GID specified by the system administrator on the home cluster, for example, the UID and GID
of the user nobody. In effect, root squashing makes the root user on remote nodes access the file system
as a non-privileged user.
Although enabling root squash is similar to setting up UID remapping, there are two important
differences:
1. While enabling UID remapping on remote nodes is an option available to the remote system
administrator, root squashing need only be enabled on the local cluster, and it will be enforced on
remote nodes. Regular UID remapping is a user convenience feature, while root squashing is a
security feature.
2. While UID remapping requires having an external infrastructure for mapping between local names
and globally unique names, no such infrastructure is necessary for enabling root squashing.
When both UID remapping and root squashing are enabled, root squashing overrides the normal UID
remapping mechanism for the root user.
Active Active
Directory Directory
Domain #1 Mount FS1 Domain #2
bl1adv009
Figure 11. High-level flow of protocols on remotely mounted file systems
This allows you to separate the tasks performed by each cluster. Storage cluster owns the file systems and
the storage. Protocol clusters contain the protocol node that provides access to the remotely mounted file
system through NFS or SMB. In this configuration, each cluster is managed independently. For more
information, see “Important information about remote access” on page 361.
Here, the storage cluster owns a file system and the protocol cluster remotely mounts the file system. The
protocol nodes (CES nodes) in the protocol cluster export the file system via SMB and NFS.
You can define one set of protocol nodes per cluster, using multiple independent protocol clusters which
remotely mount file systems. Protocol clusters can share access to a storage cluster but not to a file
system. Each protocol cluster requires a dedicated file system. Each protocol cluster can have a different
authentication configuration, thus allowing different authentication domains while keeping the data at a
central location. Another benefit is the ability to access existing ESS-based file systems through NFS or
SMB without adding nodes to the ESS cluster.
This procedure assumes an environment with the server, network, storage, and operating systems are
installed and ready for IBM Spectrum Scale. For more information, see Installing IBM Spectrum Scale on
Linux nodes and deploying protocols in IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
Note: Proceed with cluster creation of the storage cluster and one or more protocol clusters. Ensure
that the configuration parameter maxBlockSize is set to the same value on all clusters.
3. Create file systems on the storage cluster, taking the following into consideration:
v CES shared root file system – Each protocol cluster requires its own CES shared root file system.
Having a shared root file system that is different from the file system that serves data eases the
management of CES.
v Data file systems – At least one file system is required for each protocol cluster configured for
Cluster Export Services. A data file system can only be exported from a single protocol cluster.
4. Before installing and configuring Cluster Export Services, consider the following points:
v Authentication - Separate authentication schemes are supported for each CES cluster.
v ID mapping - The ID mapping of users authenticating to each CES cluster. It is recommended to
have unique ID mapping across clusters, but not mandatory.
Note: You must judiciously determine the ID mapping requirements and prevent possible
interference or security issues.
v GUI - GUI support for remote clusters is limited. Each cluster should have its own GUI. The GUI
may be installed onto CES nodes but performance must be taken into consideration.
v Object - Object is not supported in multi-cluster configurations.
v For a list of limitations, see Limitations of protocols on remotely mounted file systems in IBM Spectrum
Scale: Administration Guide.
5. Configure clusters for remote mount. For more information, see Mounting a remote GPFS file system in
IBM Spectrum Scale: Administration Guide.
6. Install and configure Cluster Export Services by using the installation toolkit or manually. For more
information, see Installing IBM Spectrum Scale on Linux nodes with the installation toolkit and Manually
installing the IBM Spectrum Scale software packages on Linux nodes in IBM Spectrum Scale: Concepts,
Planning, and Installation Guide.
Note: Use the remotely mounted CES shared root file system.
7. Once SMB and/or NFS is enabled, new exports can be created on the remotely mounted data file
system.
Consider the following aspects while you manage a multi-cluster protocol environment:
v Each cluster requires its own GUI. The GUI might be installed onto the CES nodes but performance
must be taken into consideration.
v Each cluster has its own Rest API.
v Each cluster has its own health monitoring. This means that error events that are raised in the storage
cluster are not visible in the protocol cluster and vice versa.
v Availability of certain performance metrics depends on the role of the cluster. That is, NFS metrics are
available on protocol clusters only.
v Each cluster is installed and upgraded independently.
Once all clusters in the environment are upgraded, the release and the file system version should be
changed. The release version might be changed concurrently. However, changing the file system version
requires the file system to be unmounted. To view the differences between file system versions, see the
Listing file system attributes topic in the IBM Spectrum Scale: Administration Guide.
To change the IBM Spectrum Scale release, issue the following command on each cluster:
mmchconfig release=LATEST
Note: Nodes that run an older version of IBM Spectrum Scale on the remote cluster will no longer be
able to mount the file system. Command fails if any nodes running an older version are mounted at time
command is issued.
To change the file system version, issue the following command for each file system on the storage
cluster:
mmchfs <fs> -V full
Note: This also means that certain operations such as creation of fileset and snapshots do not work on
the GUI.
v Remote file-systems are not mounted automatically resulting in client errors when a CES node on a
protocol node is restarted.
v Cross-protocol change notifications will not work on remotely-mounted file systems. For example, if an
NFS client changes a file, the system will not issue a "file change" notification to the SMB client which
has asked for a notification.
The package gpfs.gskit must be installed on all the nodes of the owning cluster and the accessing
cluster. For more information, see the installation chapter for your operating system, such as Installing
GPFS on Linux node and deploying protocols in the IBM Spectrum Scale: Concepts, Planning, and Installation
Guide.
The procedure to set up remote file system access involves the generation and exchange of authorization
keys between the two clusters. In addition, the administrator of the GPFS cluster that owns the file
system needs to authorize the remote clusters that are to access it, while the administrator of the GPFS
cluster that seeks access to a remote file system needs to define to GPFS the remote cluster and file
system whose access is desired.
Note: For more information on CES cluster setup, see “CES cluster setup” on page 467.
In this example, owningCluster is the cluster that owns and serves the file system to be mounted and
accessingCluster is the cluster that accesses owningCluster.
Note: The following example uses AUTHONLY as the authorization setting. When you specify
AUTHONLY for authentication, GPFS checks network connection authorization. However, data sent over
the connection is not protected.
1. On owningCluster, the system administrator issues the mmauth genkey command to generate a
public/private key pair. The key pair is placed in /var/mmfs/ssl. The public key file is id_rsa.pub.
mmauth genkey new
2. On owningCluster, the system administrator enables authorization by entering the following
command:
mmauth update . -l AUTHONLY
where:
mygpfs Is the device name under which the file system is known in accessingCluster.
gpfs Is the device name for the file system in owningCluster.
owningCluster
Is the name of owningCluster as given by the mmlscluster command on a node in
owningCluster.
/mygpfs
Is the local mount point in accessingCluster.
11. On accessingCluster, the system administrator enters the mmmount command to mount the file
system:
mmmount mygpfs
mmremotefs add rfs1 -f fs1 -C owningCluster -T /rfs1 mmauth grant accessingCluster -f fs1 ...
To see a list of all clusters authorized to mount file systems owned by cluster1, the administrator of
cluster1 issues this command:
mmauth show
To authorize a third cluster, say cluster3, to access file systems owned by cluster1, the administrator of
cluster1 issues this command:
mmauth add cluster3 -k cluster3_id_rsa.pub
mmauth grant cluster3 -f /dev/gpfs1
To subsequently revoke cluster3 authorization to access a specific file system gpfs1 owned by cluster1,
the administrator of cluster1 issues this command:
mmauth deny cluster3 -f /dev/gpfs1
To completely revoke cluster3 authorization to access file systems owned by cluster1, the administrator of
cluster1 issues this command:
mmauth delete cluster3
Use the mmchconfig command, subnets attribute, to specify the private IP addresses to be accessed by
GPFS.
Figure 12 on page 358 describes an AIX cluster named CL1 with nodes named CL1N1, CL1N2, and so
forth, a Linux cluster named CL2 with nodes named CL2N1, CL2N2, and another Linux cluster named
CL3 with a node named CL3N1. Both Linux clusters have public Ethernet connectivity, and a Gigabit
With the use of both public and private IP addresses for some of the nodes, the setup works as follows:
1. All clusters must be created using host names or IP addresses that correspond to the public network.
2. Using the mmchconfig command for the CL1 cluster, add the attribute: subnets=7.2.24.0.
This allows all CL1 nodes to communicate using the InfiniBand Switch. Remote mounts between CL2
and CL1 will use the public Ethernet for TCP/IP communication, since the CL2 nodes are not on the
7.2.24.0 subnet.
GPFS assumes subnet specifications for private networks are independent between clusters (private
networks are assumed not physically connected between clusters). The remaining steps show how to
indicate that a private network is shared between clusters.
3. Using the mmchconfig command for the CL2 cluster, add the subnets='10.200.0.0/
CL2.kgn.ibm.com;CL3.kgn.ibm.com' attribute. Alternatively, regular expressions are allowed here,
such as subnets='10.200.0.0/CL[23].kgn.ibm.com'. See note 2 for the syntax allowed for the regular
expressions.
This attribute indicates that the private 10.200.0.0 network extends to all nodes in clusters CL2 or CL3.
This way, any two nodes in the CL2 and CL3 clusters can communicate through the Gigabit Ethernet.
This setting allows all CL2 nodes to communicate over their Gigabit Ethernet. Matching
CL3.kgn.ibm.com with the cluster list for 10.200.0.0 allows remote mounts between clusters CL2 and
CL3 to communicate over their Gigabit Ethernet.
4. Using the mmchconfig command for the CL3 cluster, add the subnets='10.200.0.0/
CL3.kgn.ibm.com;CL2.kgn.ibm.com' attribute, alternatively subnets='10.200.0.0/CL[32].kgn.ibm.com'.
This attribute indicates that the private 10.200.0.0 network extends to all nodes in clusters CL2 or CL3.
This way, any two nodes in the CL2 and CL3 clusters can communicate through the Gigabit Ethernet.
Matching of CL3.kgn.ibm.com with the cluster list for 10.200.0.0 allows all CL3 nodes to communicate
over their Gigabit Ethernet, and matching CL2.kgn.ibm.com with that list allows remote mounts
between clusters CL3 and CL2 to communicate over their Gigabit Ethernet.
Use the subnets attribute of the mmchconfig command when you wish the GPFS cluster to leverage
additional, higher performance network connections that are available to the nodes in the cluster, or
between clusters.
Notes:
1. Use of the subnets attribute does not ensure a highly available system. If the GPFS daemon is using
the IP address specified by the subnets attribute, and that interface goes down, GPFS does not switch
to the other network. You can use mmdiag --network to verify that the subnet is in fact being used.
2. Each subnet can be listed at most once in each cluster. For example, specifying:
subnets=’10.200.0.0/CL2.kgn.ibm.com 10.200.0.0/CL3.kgn.ibm.com’
where the 10.200.0.0 subnet is listed twice, is not allowed. Therefore, subnets that span multiple
clusters have to be assigned a cluster name pattern or a semicolon-separated cluster name list. It is
possible to combine these, for example, items in semicolon-separated cluster lists can be plain names
or regular expressions, as in the following:
subnets=’1.0.0.1/CL[23].kgn.ibm.com;OC.xyz.ibm.com’
The following shows examples of patterns that are accepted:
[af3] matches letters ’a’ and ’f’, and number 3
[0-7] matches numbers 0, 1, ... 7
[a-p0-7] matches letter a, b, ... p and numbers from 0 to 7 inclusive
* matches any sequence of characters
? matches any (one) character
If node CL2N1 on cluster CL2.kgn.ibm.com has network interfaces with IP addresses 10.200.0.1 and
10.201.0.1, and node CLN31 on cluster CL3.kgn.ibm.com has network interfaces with IP addresses
10.200.0.5 and 10.201.0.5, then the communication between these two nodes will flow over the
10.200.0.0 subnet, with CL2N1 using the interface with IP address 10.200.0.1, and CLN31 using the
interface with IP address 10.200.0.5.
Specifying a cluster name or a cluster name pattern for each subnet is only needed when a private
network is shared across clusters. If the use of a private network is confined within the local cluster,
then no cluster name is required in the subnet specification.
CL1
7.2.24.1 7.2.24.2 7.2.24.3
Public Ethernet
7.2.28/22
CL2 CL3
7.2.28.20 7.2.28.21 7.2.28.22
Figure 12. Use of public and private IP addresses in three GPFS clusters
When multiple security levels are specified, each connection must use the security level of the connecting
node unless that security level is AUTHONLY. In this case, the security level of the node that accepts the
To specify a different security level for different clusters that request access to a specified cluster, use the
mmauth -l cipherList command. Several examples follow to illustrate:
1. In this example, cluster1 and cluster2 are on the same trusted network, and cluster3 is connected to
both of them with an untrusted network. The system administrator chooses these security levels:
v A cipherList of AUTHONLY for connections between cluster1 and cluster2
v A cipherList of AES128-SHA for connections between cluster1 and cluster3
v A cipherList of AES128-SHA for connections between cluster2 and cluster3
The administrator of cluster1 issues these commands:
mmauth add cluster2 -k keyFile -l AUTHONLY
mmauth add cluster3 -k keyFile -l AES128-SHA
2. In this example, cluster2 is accessing file systems that are owned by cluster1 by using a cipherList of
AUTHONLY, but the administrator of cluster1 decides to require a more secure cipherList. The
administrator of cluster1 issues this command:
mmauth update cluster2 -l AES128-SHA
Existing connections is upgraded from AUTHONLY to AES128-SHA.
To accomplish this, the cluster that owns and serves the file system is made to temporarily have two
access keys (referred to as the 'old key' and the 'new key'), which are both valid at the same time. The
clusters currently accessing the file system can then change from the old key to the new key without
interruption of file system access.
In this example, cluster1 is the name of the cluster that owns and serves a file system, and cluster2 is the
name of the cluster that has already obtained access to this file system, and is currently using it. Here, the
system administrator of cluster1 changes the access key without severing the connection obtained by
cluster2.
1. On cluster1, the system administrator issues the mmauth genkey new command to generate a new
public/private access key pair. The key pair is placed in /var/mmfs/ssl:
mmauth genkey new
After this command is issued, cluster1 will have two keys (referred to as the 'old key' and the 'new
key') that can both be used to access cluster1 file systems.
2. The system administrator of cluster1 now gives the file /var/mmfs/ssl/id_rsa.pub (that contains the
new key) to the system administrator of cluster2, who desires to continue to access the cluster1 file
systems. This operation requires the two administrators to coordinate their activities, and must occur
outside of the GPFS command environment.
3. On cluster2, the system administrator issues the mmremotecluster update command to make the new
key known to his system:
mmremotecluster update cluster1 -k cluster1_id_rsa.pub
where:
cluster1
Is the real name of cluster1 as given by the mmlscluster command on a node in cluster1.
Similarly, the administrator of cluster2 might decide to change the access key for cluster2:
1. On cluster2, the system administrator issues the mmauth genkey new command to generate a new
public/private access key pair. The key pair is placed in /var/mmfs/ssl:
mmauth genkey new
After this command is issued, cluster2 will have two keys (referred to as the 'old key' and the 'new
key') that can both be used when a connection is established to any of the nodes in cluster2.
2. The system administrator of cluster2 now gives the file /var/mmfs/ssl/id_rsa.pub (that contains the
new key) to the system administrator of cluster1, the owner of the file systems. This operation
requires the two administrators to coordinate their activities, and must occur outside of the GPFS
command environment.
3. On cluster1, the system administrator issues the mmauth update command to make the new key
known to his system:
mmauth update cluster2 -k cluster2_id_rsa.pub
where:
cluster2
Is the real name of cluster2 as given by the mmlscluster command on a node in cluster2.
cluster2_id_rsa.pub
Is the name of the file obtained from the administrator of cluster2 in Step 2.
This permits the cluster desiring to mount the file system to continue mounting file systems owned
by cluster1.
4. The system administrator of cluster2 verifies that the administrator of cluster1 has received the new
key and activated it using the mmauth update command.
5. On cluster2, the system administrator issues the mmauth genkey commit command to commit the
new key as the only valid access key. The old key will no longer be accepted once this command
completes successfully:
mmauth genkey commit
NIST compliance
The nistCompliance configuration variable allows the system administrator to restrict the set of available
algorithms and key lengths to a subset of those approved by NIST.
The nistCompliance variable applies to security transport (tscomm security, key retrieval) only, not to
encryption, which always uses NIST-compliant mechanisms.
For the valid values for nistCompliance, see mmchconfig command in the IBM Spectrum Scale: Command and
Programming Reference guide.
The nistCompliance configuration variable has been introduced on version 4.1. Clusters created prior to
that release operate with the equivalent of that variable being set to off. Similarly, clusters created on
prior versions and which are migrated to 4.1 will have nistCompliance set to off.
A cluster created on version 4.1 or higher, and operating with nistCompliance set to SP800-131A, will be
unable to remote-mount a file system from a version 3.5 cluster, since the 4.1 cluster will not accept the
key from the latter, which is not NIST SP800-131A-compliant. To allow the version 4.1 cluster to
remote-mount the version 3.5 cluster, issue the
mmchconfig nistCompliance=off
command on the version 4.1 cluster, before the mmremotecluster add command can be issued. The key
exchange will work even if the version 4.1 cluster already has a NIST-compliant key.
A cluster upgraded from prior versions may have the nistCompliance set to off and may be operating
with keys which are not NIST SP800-131A-compliant. To upgrade the cluster to operate in NIST
SP800-131A mode, the following procedure should be followed:
From a node in the cluster which is running version 4.1 or later, issue:
mmauth genkey new
mmauth genkey commit
If remote clusters are present, follow the procedure described in the “Changing security keys with remote
access” on page 359 section (under Chapter 25, “Accessing a remote GPFS file system,” on page 347) to
update the key on the remote clusters.
Once all nodes in the cluster are running at least version 4.1, run the following command from one of the
nodes in the cluster:
mmchconfig release=LATEST
From one of the nodes in the cluster, run the following command:
mmchconfig nistCompliance=SP800-131A
When working with GPFS file systems accessed by nodes that belong to other GPFS clusters, consider the
following points:
1. A file system is administered only by the cluster where the file system was created. Other clusters
may be allowed to mount the file system, but their administrators cannot add or delete disks, change
characteristics of the file system, enable or disable quotas, run the mmfsck command, and so forth.
The only commands that other clusters can issue are list type commands, such as: mmlsfs, mmlsdisk,
mmlsmount, and mmdf.
Using these tools, GPFS can automatically determine where to physically store your data regardless of its
placement in the logical directory structure. Storage pools, filesets and user-defined policies provide the
ability to match the cost of your storage resources to the value of your data.
Note: This feature is available with IBM Spectrum Scale Standard Edition or higher.
To work with ILM in the GUI, click Files > Information Lifecycle.
Use the following information to create and manage information lifecycle management policies in IBM
Spectrum Scale:
Storage pools
Physically, a storage pool is a collection of disks or RAID arrays. Storage pools also allow you to group
multiple storage systems within a file system.
Using storage pools, you can create tiers of storage by grouping storage devices based on performance,
locality, or reliability characteristics. For example, one pool could be an enterprise class storage system
that hosts high-performance Fibre Channel disks and another pool might consist of numerous disk
controllers that host a large set of economical SATA disks.
There are two types of storage pools in GPFS, internal storage pools and external storage pools. Internal
storage pools are managed within GPFS. External storage pools are managed by an external application
such as IBM Spectrum Protect. For external storage pools, GPFS provides tools that allow you to define
an interface that your external storage manager uses to access your data. GPFS does not manage the data
placed in external storage pools. Instead, GPFS manages the movement of data to and from external
storage pools. Storage pools allow you to perform complex operations such as moving, mirroring, or
deleting files across multiple storage devices, providing storage virtualization and a single management
context.
| For more information, see the following subtopics on internal storage pools and external storage pools.
The internal GPFS storage pool to which a disk belongs is specified as an attribute of the disk in the
GPFS cluster. You specify the disk attributes as a field in each disk descriptor when you create the file
system or when adding disks to an existing file system. GPFS allows a maximum of eight internal storage
pools per file system. One of these storage pools is the required system storage pool. The other seven
internal storage pools are optional user storage pools.
GPFS assigns file data to internal storage pools under these circumstances:
v When the file is initially created; the storage pool is determined by the file placement policy that is in
effect when at the time of file creation.
v When the attributes of the file, such as file size or access time, match the rules of a policy that directs
GPFS to migrate the data to a different storage pool.
The system storage pool can also contain user data. There is only one system storage pool per file
system, and it is automatically created when the file system is created.
Important: It is recommended that you use highly-reliable disks and replication for the system storage
pool because it contains system metadata.
The amount of metadata grows as you add files to the system. Therefore, it is recommended that you
monitor the system storage pool to ensure that there is always enough space to accommodate growth.
This storage pool must be created explicitly. It is highly recommended to only use storage that is as fast
or even faster than what is used for the system storage pool. This recommendation is because of the high
number of small synchronous data updates made to the recovery log. The block size for the system.log
pool must be the same as the block size of the system pool.
The file system recovery log will only be stored in one pool.
The system.log storage pool is an optional dedicated storage pool that contains only the file system
recovery logs. If you define this pool, then IBM Spectrum Scale uses it for all the file system recovery
logs of the file system. Otherwise, the file system recovery logs are kept in the system storage pool. It is a
good practice for the system.log pool to consist of storage media that is as fast as or faster than the
storage media of the system storage pool. If the storage is nonvolatile, this pool can be used for the
high-availability write cache (HAWC).
In addition, file data can be migrated to a different storage pool according to your file management
policies. For more information on policies, see “Policies for automating file management” on page 370.
A user storage pool only contains the blocks of data (user data, for example) that make up a user file.
GPFS stores the data that describes the files, called file metadata, separately from the actual file data in the
system storage pool. You can create one or more user storage pools, and then create policy rules to
indicate where the data blocks for a file should be stored.
A file's access temperature is an attribute for policy that provides a means of optimizing tiered storage.
File temperatures are a relative attribute, indicating whether a file is “hotter” or “colder” than the others
in its pool. The policy can be used to migrate hotter files to higher tiers and colder files to lower. The
access temperature is an exponential moving average of the accesses to the file. As files are accessed the
temperature increases; likewise when the access stops the file cools. File temperature is intended to
optimize nonvolatile storage, not memory usage; therefore, cache hits are not counted. In a similar
manner, only user accesses are counted.
The access counts to a file are tracked as an exponential moving average. An unaccessed file loses a
percentage of its accesses each period. The loss percentage and period are set via the configuration
variables fileHeatLossPercent and fileHeatPeriodMinutes. By default, the file access temperature is not
tracked. To use access temperature in policy, the tracking must first be enabled. To do this, set the two
configuration variables as follows:
fileHeatLossPercent
The percentage (between 0 and 100) of file access temperature dissipated over the
fileHeatPeriodMinutes time. The default value is 10.
Chapter 26. Information lifecycle management for IBM Spectrum Scale 365
fileHeatPeriodMinutes
The number of minutes defined for the recalculation of file access temperature. To turn on
tracking, fileHeatPeriodMinutes must be set to a nonzero value. The default value is 0.
The following example sets fileHeatPeriodMinutes to 1440 (24 hours) and fileHeatLossPercent to 10,
meaning that unaccessed files will lose 10% of their heat value every 24 hours, or approximately 0.4%
every hour (because the loss is continuous and “compounded” geometrically):
mmchconfig fileheatperiodminutes=1440,fileheatlosspercent=10
Note: If the updating of the file access time (atime) is suppressed or if relative atime semantics are in
effect, proper calculation of the file access temperature may be adversely affected.
File access temperature is tracked on a per-cluster basis, not on a per–file system basis.
Use WEIGHT(FILE_HEAT) with a policy MIGRATE rule to prioritize migration by file temperature.
(You can use the GROUP POOL rule to define a group pool to be specified as the TO POOL target.) See
“Policies for automating file management” on page 370.
The storage pool to which a disk belongs is an attribute of each disk and is specified as a field in each
disk descriptor when the file system is created using the mmcrfs command or when disks are added to
an existing file system with the mmadddisk command. Adding a disk with a new storage pool name in
the disk descriptor automatically creates the storage pool.
If a storage pool is not specified, the disk is by default assigned to the system storage pool.
The --metadata-block-size flag on the mmcrfs command can be used to create a system pool with a
different block size from the user pools. This can be especially beneficial if the default block size is larger
than 1 MB. If data and metadata block sizes differ, the system pool must contain only metadataOnly
disks.
Once a disk is assigned to a storage pool, the pool assignment cannot be changed by using either the
mmchdisk command or the mmrpldisk command. You can, however, change the pool to which the disk
is assigned.
You can change the storage pool that a file is assigned to.
A root user can change the storage pool that a file is assigned to by either:
v Running mmapplypolicy with an appropriate set of policy rules.
v Issuing the mmchattr -P command.
By default, both of these commands migrate data immediately (this is the same as using the -I yes option
for these commands). If desired, you can delay migrating the data by specifying the -I defer option for
either command. Using the defer option, the existing data does not get moved to the new storage pool
until either the mmrestripefs command or the mmrestripefile command are executed. For additional
information, refer to:
v “Overview of policies” on page 370
v “Rebalancing files in a storage pool” on page 369
System storage pools, system.log pools and user storage pools have different deletion requirements.
Deleting the System storage pool is not allowed. You can delete the System storage pool only after you
have deleted the file system.
You can delete the system.log pool by deleting all the disks in the system.log pool. You do not need to
run a policy to empty the system.log pool first, because the system.log pool can only contain log files,
and those are automatically migrated to the System pool when you delete the system.log pool.
In order to delete a user storage pool, you must delete all its disks using the mmdeldisk command. When
GPFS deletes the last remaining disk from a user storage pool, the storage pool is also deleted. To delete
a storage pool, it must be completely empty. A migration policy along with the mmapplypolicy command
could be used to do this.
To list the storage pools available for a specific file system, issue the mmlsfs -P command.
Chapter 26. Information lifecycle management for IBM Spectrum Scale 367
flag value description
------------------- ------------------------ -----------------------------------
-P system;sp1;sp2 Disk storage pools in file system
For file system fs1, there are three storage pools: the system storage pool and user storage pools named
sp1 and sp2.
To display the assigned storage pool and the name of the fileset that includes the file, issue the mmlsattr
-L command.
File myfile is assigned to the storage pool named sp1 and is part of the root fileset.
To list the disks belonging to a storage pool, issue the mmdf -P command.
This example shows that storage pool sp1 in file system fs1 consists of eight disks and identifies details
for each disk including:
v Name
v Size
v Failure group
v Data type
A root user can rebalance file data across all disks in a file system by issuing the mmrestripefs command.
Optionally:
v Specifying the -P option rebalances only those files assigned to the specified storage pool.
v Specifying the -p option rebalances the file placement within the storage pool. For files that are
assigned to one storage pool, but that have data in a different pool, (referred to as ill-placed files),
using this option migrates their data to the correct pool. (A file becomes “ill-placed” when the -I defer
option is used during migration of the file between pools.)
To enable data replication in a storage pool, you must make certain that there are at least two failure
groups within the storage pool.
This is necessary because GPFS maintains separation between storage pools and performs file replication
within each storage pool. In other words, a file and its replica must be in the same storage pool. This also
means that if you are going to replicate the entire file system, every storage pool in the file system must
have at least two failure groups.
Note: Depending on the configuration of your file system, if you try to enable file replication in a storage
pool having only one failure group, GPFS will either give you a warning or an error message.
| External pools in IBM Spectrum Scale can be represented by a variety of tools that include IBM Spectrum
| Protect for Space Management (HSM), IBM Spectrum Scale Transparent Cloud Tiering, and IBM Spectrum
| Archive Enterprise Edition (EE). These tools allow files from the IBM Spectrum Scale file system to
| migrate to another storage system that is not directly connected to and managed by IBM Spectrum Scale.
External storage pools use a flexible interface driven by GPFS policy rules that simplify data migration to
and from other types of storage such as tape storage. For additional information, refer to “Policies for
automating file management” on page 370.
You can define multiple external storage pools at any time using GPFS policy rules. To move data to an
external storage pool, the GPFS policy engine evaluates the rules that determine which files qualify for
transfer to the external pool. From that information, GPFS provides a list of candidate files and executes
the script specified in the rule that defines the external pool. That executable script is the interface to the
external application, such as IBM Spectrum Protect, that does the actual migration of data into an
external pool. Using the external pool interface, GPFS gives you the ability to manage information by
allowing you to:
1. Move files and their extended attributes onto low-cost near-line or offline storage when demand for
the files diminishes.
2. Recall the files, with all of their previous access information, onto online storage whenever the files
are needed.
Chapter 26. Information lifecycle management for IBM Spectrum Scale 369
upon the request of an application accessing the file system. Therefore, when you are using external
storage pools, you must use an external file management application such as IBM Spectrum Protect. The
external application is responsible for maintaining the file once it has left the GPFS file system. For
example, GPFS policy rules create a list of files that are eligible for migration. GPFS hands that list to
IBM Spectrum Protect which migrates the files to tape and creates a reference file in the file system that
has pointers to the tape image. When a file is requested, it is automatically retrieved from the external
storage pool and placed back in an internal storage pool. As an alternative, you can use a GPFS policy
rule to retrieve the data in advance of a user request.
The number of external storage pools is only limited by the capabilities of your external application.
GPFS allows you to define external storage pools at any time by writing a policy that defines the pool
and makes that location known to GPFS. External storage pools are defined by policy rules and initiated
by either storage thresholds or use of the mmapplypolicy command.
For additional information, refer to “Working with external storage pools” on page 403.
You can create and manage policies and policy rules with both the command line interface and the GUI.
In the GUI, navigate to Files > Information Lifecycle Management.
Overview of policies
A policy is a set of rules that describes the life cycle of user data based on the attributes of files. Each rule
defines an operation or definition, such as “migrate to a pool and replicate the file.”
Similarly, if the file system has snapshots and a file is written to, the snapshot placement policy
determines the storage pool where the snapshot blocks are placed.
The management policy determines file management operations such as migration, deletion, and file
compression or decompression.
In order to migrate or delete data, you must use the mmapplypolicy command. To compress or
decompress data, you can use either the mmapplypolicy command with a MIGRATE rule or the
mmchattr command. You can define the file management rules and install them in the file system
together with the placement rules. As an alternative, you may define these rules in a separate file and
explicitly provide them to mmapplypolicy using the -P option. In either case, policy rules for placement
or migration may be intermixed. Over the life of the file, data can be migrated to a different storage pool
any number of times, and files can be deleted or restored.
Note: In a multicluster environment, the scope of the mmapplypolicy command is limited to the nodes
in the cluster that owns the file system.
Note: File compression or decompression using the mmapplypolicy command is not supported on the
Windows operating system.
File management rules can also be used to control the space utilization of GPFS online storage pools.
When the utilization for an online pool exceeds the specified high threshold value, GPFS can be
configured, through user exits, to trigger an event that can automatically start mmapplypolicy and
reduce the utilization of the pool. Using the mmaddcallback command, you can specify a script that will
run when such an event occurs. For more information, see the topic mmaddcallback command in the IBM
Spectrum Scale: Command and Programming Reference.
GPFS performs error checking for file-placement policies in the following phases:
v When you install a new policy, GPFS checks the basic syntax of all the rules in the policy.
v GPFS also checks all references to storage pools. If a rule in the policy refers to a storage pool that does
not exist, the policy is not installed and an error is returned.
v When a new file is created, the rules in the active policy are evaluated in order. If an error is detected,
GPFS logs an error, skips all subsequent rules, and returns an EINVAL error code to the application.
v Otherwise, the first applicable rule is used to store the file data.
Default file placement policy:
When a GPFS file system is first created, the default file placement policy is to assign all files to the
system storage pool. You can go back to the default policy by running the command:
mmchpolicy Device DEFAULT
For more information on using GPFS commands to manage policies, see “Managing policies” on page
399.
Policy rules
A policy rule is an SQL-like statement that tells GPFS what to do with the data for a file in a specific
storage pool if the file meets specific criteria. A rule can apply to any file being created or only to files
being created within a specific fileset or group of filesets.
Chapter 26. Information lifecycle management for IBM Spectrum Scale 371
A policy rule specifies one or more conditions that, when true, cause the rule to be applied. Conditions
can be specified by SQL expressions, which can include SQL functions, variables, and file attributes. Some
of the many available file attributes are shown in the following list. For more information, see “File
attributes in SQL expressions” on page 379:
v Date and time when the rule is evaluated, that is, the current date and time
v Date and time when the file was last accessed
v Date and time when the file was last modified
v Fileset name
v File name or extension
v File size
v User ID and group ID
Note: Some file attributes are not valid in all types of policy rules.
GPFS evaluates policy rules in order, from first to last, as they appear in the policy. The first rule that
matches determines what is to be done with that file. For example, when a client creates a file, GPFS
scans the list of rules in the active file placement policy to determine which rule applies to the file. When
a rule applies to the file, GPFS stops processing the rules and assigns the file to the appropriate storage
pool. If no rule applies, an EINVAL error code is returned to the application.
There are nine types of policy rules that allow you to define specific actions that GPFS will implement on
the file data. Each rule has clauses that control candidate selection, namely when the rule is allowed to
match a file, what files it will match, the order to operate on the matching files and additional attributes
to show for each candidate file. Different clauses are permitted on different rules based upon the
semantics of the rule.
The policy rules and their respective syntax diagrams are as follows. For more information about
encryption-specific rules, see Chapter 36, “Encryption,” on page 565.
v File placement rules
RULE [’RuleName’]
SET POOL ’PoolName’
[LIMIT (OccupancyPercentage)]
[REPLICATE (DataReplication)]
[FOR FILESET (’FilesetName’[,’FilesetName’]...)]
[ACTION (SqlExpression)]
[WHERE SqlExpression]
v Snapshot placement rule
RULE [’RuleName’]
SET SNAP_POOL ’PoolName’
[LIMIT (OccupancyPercentage)]
[REPLICATE (DataReplication)]
[FOR FILESET (’FilesetName’[,’FilesetName’]...)]
[ACTION (SqlExpression)]
[WHERE SqlExpression]
v Group pool rule; used to define a list of pools that may be used as a pseudo-pool source or destination
in either a FROM POOL or TO POOL clause within another rule
RULE [’RuleName’] GROUP POOL [’groupPoolName’]
IS ’poolName’ [LIMIT(OccupancyPercentage)]
THEN ’poolName2’ [LIMIT(n2)]
THEN ’pool-C’ [LIMIT(n3)]
THEN ...
Chapter 26. Information lifecycle management for IBM Spectrum Scale 373
RULE [’RuleName’]
EXTERNAL POOL ’PoolName’
EXEC ’InterfaceScript’
[OPTS ’OptionsString ...’]
[ESCAPE ’%SpecialCharacters’]
[SIZE sum-number]
v External list definition rule
RULE [’RuleName’]
EXTERNAL LIST ’ListName’
EXEC ’InterfaceScript’
[OPTS ’OptionsString ...’]
[ESCAPE ’%SpecialCharacters’]
[THRESHOLD ’ResourceClass’]
[SIZE sum-number]
The following terms are used in policy rules. Some terms appear in more than one rule:
ACTION (SqlExpression)
Specifies an SQL expression that is evaluated only if the other clauses of the rule are satisfied. The
action of the SqlExpression is completed, and the resulting value of the SqlExpression is discarded. In
the following example, the rule sets the extended attribute “user.action” to the value “set pool s6” for
files that begin with the characters “sp”. These files are assigned to the system pool:
rule ’s6’ set pool ’system’ action(setxattr(’user.action’,’set pool s6’)) where name like ’sp%’
Note: Compression with the z compression library is intended primarily for cold data and favors
saving space over access speed. Compression with the lz4 compression library is intended primarily
for active data and favors access speed over saving space.
yes
Files that are uncompressed are to be compressed with the z compression library. Files that are
already compressed are not affected.
no Files that are compressed are to be decompressed. Files that are already uncompressed are not
affected.
z Files that are uncompressed are to be compressed with the z compression library. Files that are
already compressed with the lz4 library are to be recompressed with the z library. Files that are
already compressed with the z library are not affected.
lz4
Files that are uncompressed are to be compressed with the lz4 compression library. Files that are
already compressed with the z library are to be recompressed with the lz4 library. Files that are
already compressed with the lz4 library are not affected.
The following rule compresses the files in the pool datapool that begin with the string green%.
Because the policy term COMPRESS specifies yes instead of a compression library, compression is done
with the default compression library, which is the z library.
RULE ’COMPR1’ MIGRATE FROM POOL ’datapool’ COMPRESS(’yes’) WHERE NAME LIKE ’green%’
For more information, see the topic File compression in the IBM Spectrum Scale: Administration Guide.
Both rules specify that all characters except the “unreserved” characters in the set a-zA-Z0-9-_.~ are
encoded as %XX, where XX comprises two hexadecimal digits.
However, the GPFS ESCAPE syntax adds to the set of “unreserved” characters. In the first rule, the
syntax ESCAPE ’%’ specifies a rigorous RFC3986 encoding. Under this rule, a path name such as
/root/directory/@abc+def#ghi.jkl appears in a file list in the following format:
%2Froot%2Fdirectory%2F%40abc%2Bdef%23ghi.jkl
In the second rule, the syntax ESCAPE ’%/+@#’ specifies that none of the characters in set /+@# are
escaped. Under this rule, the same path name appears in a file list in the following format:
/root/directory/@abc+def#ghi.jkl
If you omit the ESCAPE clause, the newline character is escaped as ’\n’, and the backslash character
is escaped as ’\\’; all other characters are presented as is, without further encoding.
EXCLUDE
Identifies a file exclusion rule.
RULE ’x’ EXCLUDE
A file that matches this form of the rule is excluded from further consideration by any MIGRATE
or DELETE rules that follow.
RULE 'rule-name' LIST ’listname-y’ EXCLUDE
A file that matches this form of the rule is excluded from further consideration by any LIST rules
that name the same listname-y.
EXEC 'InterfaceScript'
Specifies an external program to be invoked to pass requests to an external storage management
application. InterfaceScript must be a fully qualified path name to a user-provided script or program
that supports the commands described in “User-provided program for managing external pools” on
page 404.
EXTERNAL LIST ListName
Defines an external list. This rule does not match files. It provides the binding between the lists that
are generated with regular LIST rules with a matching ListName and the external program that you
want to run with these lists as input.
EXTERNAL POOL PoolName
Defines an external storage pool. This rule does not match files but defines the binding between the
policy language and the external storage manager that implements the external storage.
FOR FILESET ('FilesetName'[,'FilesetName']...)
Specifies that the rule applies only to files within the specified filesets.
FROM POOL FromPoolName
Specifies the name of the source pool from which files are candidates for migration.
Chapter 26. Information lifecycle management for IBM Spectrum Scale 375
GROUP POOL PoolName
Defines a group pool. This rule supports the concept of distributing data files over several GPFS disk
pools.
Optionally, a LIMIT, specified as an occupancy percentage, can be specified for each disk pool; if not
specified, the limit defaults to 99%. The THEN keyword signifies that disk pools that are specified
before a THEN keyword are preferred over disk pools that are specified after. When a pool that is
defined by a GROUP POOL rule is the TO POOL target of a MIGRATE rule, the selected files are
distributed among the disk pools that comprise the group pool. Files of highest weight are put into
the most preferred disk pool up to the occupancy limit for that pool. If more files must be migrated,
they are put into the second most preferred pool up to the occupancy limit for that pool. Again, files
of highest weight are selected.
If you specify a file that is defined by a GROUP POOL rule in a FROM POOL clause, the clause
matches any file in any of the disk pools in the group pool.
You can “repack” a group pool by WEIGHT. Migrate files of higher weight to preferred disk pools
by specifying a group pool as both the source and the target of a MIGRATE rule.
rule ’grpdef’ GROUP POOL ’gpool’ IS ’ssd’ LIMIT(90) THEN ’fast’ LIMIT(85) THEN ’sata’
rule ’repack’ MIGRATE FROM POOL ’gpool’ TO POOL ’gpool’ WEIGHT(FILE_HEAT)
See “Tracking file access temperature within a storage pool” on page 365.
LIMIT (OccupancyPercentage)
Limits the creation of data in a storage pool. GPFS does not migrate a file into a pool if doing so
exceeds the occupancy percentage for the pool. If you do not specify an occupancy percentage for a
pool, the default value is 99%. See “Phase two: Choosing and scheduling files” on page 392.
You can specify OccupancyPercentage as a floating point number, as in the following example:
RULE ’r’ RESTORE to pool ’x’ limit(8.9e1)
For testing or planning purposes, and when you use the mmapplypolicy command with the -I defer
or -I test option, you can specify a LIMIT larger than 100%.
The limit clause does not apply when the target TO POOL is a GROUP POOL. The limits that are
specified in the rule that defines the target GROUP POOL govern the action of the MIGRATE rule.
LIST ListName
Identifies a file list generation rule. A file can match more than one list rule but appears in a list only
once. ListName provides the binding to an EXTERNAL LIST rule that specifies the executable
program to call when the generated list is processed.
MIGRATE
Identifies a file migration rule. A file that matches this rule becomes a candidate for migration to the
pool specified by the TO POOL clause.
OPTS 'OptionsString ...'
Specifies optional parameters to be passed to the external program defined with the EXEC clause.
OptionsString is not interpreted by the GPFS policy engine.
REPLICATE (DataReplication)
Overrides the default data replication factor. This value must be specified as 1, 2, or 3.
RESTORE TO POOL PoolName
where Identifies a file restore rule. When you restore a file with the gpfs_fputattrswithpathname()
subroutine, you can use this rule to match files against their saved attributes rather than the current
file attributes. This rule also applies to a command that uses that subroutine, such as the IBM
Spectrum Protect command dsmc restore.
RULE ['RuleName']
Initiates the rule statement. RuleName identifies the rule and is used in diagnostic messages.
Note: The pool is only set when the file data is written to the snapshot, not when the snapshot is
created.
SHOW (['String'] SqlExpression)
Inserts the requested information (the character representation of the evaluated SQL expression
SqlExpression) into the candidate list that is created by the rule when it deals with external storage
pools. String is a literal value that gets echoed back.
This clause has no effect in matching files but can be used to define other attributes to be exported
with the candidate file lists.
SIZE (numeric-sql-expression)
Is an optional clause of any MIGRATE, DELETE, or LIST rules that are used for choosing candidate
files. numeric-sql-expression specifies the size of the file to be used when in calculating the total
amount of data to be passed to a user script. The default is KB_ALLOCATED.
SIZE sum-number
Is an optional clause of the EXTERNAL POOL and EXTERNAL LIST rules. sum-number limits the
total number of bytes in all of the files named in each list of files passed to your EXEC 'script'. If a
single file is larger than sum-number, it is passed to your EXEC 'script' as the only entry in a
“singleton” file list.
Specify sum-number as a numeric constant or a floating-point value.
Chapter 26. Information lifecycle management for IBM Spectrum Scale 377
Notes:
1. Percentage values can be specified as numeric constants or floating-point values.
2. This option applies only when you migrate to the external storage pool.
3. This option does not apply when the current rule operates on one group pool.
THRESHOLD (ResourceClass)
Specifies the type of capacity-managed resources that are associated with ListName. The following
values are valid:
FILESET_QUOTAS
Indicates that the LIST rule must use the occupancy percentage of the “hard limit” fileset
quota per the mmlsquota and mmedquota commands.
FILESET_QUOTA_SOFT
Indicates that the LIST rule must use the occupancy percentage of the “soft limit” fileset
quota per the mmlsquota and mmedquota commands.
GROUP_QUOTAS
Indicates that the LIST rule must use the occupancy percentage of the “hard limit” group
quota per the mmlsquota and mmedquota commands.
GROUP_QUOTA_SOFT
Indicates that the LIST rule must use the occupancy percentage of the “soft limit” group
quota per the mmlsquota and mmedquota commands.
POOL_CAPACITIES
Indicates that the LIST rule uses the occupancy percentage of the pool when it applies the
threshold rule. This value is the default value. This value is used if the threshold is not
specified in the EXTERNAL LIST rule but appears in the LIST rule.
USER_QUOTAS
Indicates that the LIST rule uses the occupancy percentage of the “hard limit” user quota per
the mmlsquota and mmedquota commands.
USER_QUOTA_SOFT
Indicates that the LIST rule uses the occupancy percentage of the “soft limit” user quota per
the mmlsquota and mmedquota commands.
Note: This option does not apply when the current rule operates on one group pool.
For more detail on how THRESHOLD can be used to control file migration and deletion, see “Phase
one: Selecting candidate files” on page 391 and “Pre-migrating files with external storage pools” on
page 407.
TO POOL ToPoolName
Specifies the name of the storage pool to which all the files that match the rule criteria are migrated.
This phrase is optional if the COMPRESS keyword is specified.
WEIGHT (WeightExpression)
Establishes an order on the matching files. Specifies an SQL expression with a numeric value that can
be converted to a double-precision floating point number. The expression can refer to any of the file
attributes and can include any constants and any of the available SQL operators or built-in functions.
WHEN (TimeBooleanExpression)
Specifies an SQL expression that evaluates to TRUE or FALSE, depending only on the SQL built-in
variable CURRENT_TIMESTAMP. If the WHEN clause is present and TimeBooleanExpression
evaluates to FALSE, the rule is skipped.
The mmapplypolicy command assigns the CURRENT_TIMESTAMP when it begins processing. It
uses either the actual Coordinated Universal Time date and time or the date specified with the -D
option.
You can reference different file attributes as SQL variables and combine them with SQL functions an
operators. Depending on the clause, the SQL expression must evaluate to either TRUE or FALSE, a
numeric value, or a character string. Not all file attributes are available to all rules.
SQL expressions can include file attributes that specify certain clauses.
The following file attributes can be used in SQL expressions specified with the WHERE, WEIGHT, and
SHOW clauses:
ACCESS_TIME
Specifies an SQL time stamp value for the date and time that the file was last accessed (POSIX atime).
See EXPIRATION_TIME.
BLOCKSIZE
Specifies the size, in bytes, of each block of the file.
CHANGE_TIME
Specifies an SQL time stamp value for the date and time that the file metadata was last changed
(POSIX ctime).
CLONE_DEPTH
Specifies the depth of the clone tree for the file.
CLONE_IS_PARENT
Specifies whether the file is a clone parent.
CLONE_PARENT_FILESETID
Specifies the fileset ID of the clone parent. The fileset ID is available only if
CLONE_PARENT_IS_SNAP is a nonzero value.
CLONE_PARENT_INODE
Specifies the inode number of the clone parent, or NULL if it is not a file clone.
CLONE_PARENT_IS_SNAP
Specifies whether the clone parent is in a snapshot.
CLONE_PARENT_SNAP_ID
Specifies the snapshot ID of the clone parent. The snapshot ID is available only if
CLONE_PARENT_IS_SNAP is a nonzero value.
CREATION_TIME
Specifies an SQL time stamp value that is assigned when a file is created.
DEVICE_ID
Specifies the ID of the device that contains the directory entry.
Chapter 26. Information lifecycle management for IBM Spectrum Scale 379
DIRECTORY_HASH
Can be used to group files within the same directory.
DIRECTORY_HASH is a function that maps every PATH_NAME to a number. All files within the
same directory are mapped to the same number and deeper paths are assigned to larger numbers.
DIRECTORY_HASH uses the following functions:
CountSubstr(BigString,LittleString)
Counts and returns the number of occurrences of LittleString in BigString.
HashToFloat(StringValue)
Is a hash function that returns a quasi-random floating point number ≥ 0 and < 1, whose value
depends on a string value. Although the result might appear random, HashToFloat(StringValue)
always returns the same floating point value for a particular string value.
The following rule lists the directory hash values for three directories:
RULE ’y’ LIST ’xl’ SHOW(DIRECTORY_HASH)
LIST ’xl’ /abc/tdir/randy1 SHOW(+3.49449638091027E+000)
LIST ’xl’ /abc/tdir/ax SHOW(+3.49449638091027E+000)
LIST ’xl’ /abc/tdir/mmPolicy.8368.765871DF/mm_tmp/PWL.12 SHOW(+5.21282524359412E+000)
LIST ’xl’ /abc/tdir/mmPolicy.31559.1E018912/mm_tmp/PWL.3 SHOW(+5.10672733094543E+000)
LIST ’xl’ /abc/tdir/mmPolicy.31559.1E018912/mm_tmp/PWL.2 SHOW(+5.10672733094543E+000)
The following rule causes files within the same directory to be grouped and processed together
during deletion. Grouping the files can improve the performance of GPFS directory-locking and
caching.
RULE ’purge’ DELETE WEIGHT(DIRECTORY_HASH) WHERE (deletion-criteria)
EXPIRATION_TIME
Specifies the expiration time of the file, expressed as an SQL time-stamp value. If the expiration time
of a file is not set, its expiration time is SQL NULL. You can detect such files by checking for
"EXPIRATION_TIME IS NULL".
Remember the following points:
v EXPIRATION_TIME is tracked independently from ACCESS_TIME and both values are
maintained for immutable files.
v Expiration time and indefinite retention are independent attributes. You can change the value of
either one without affecting the value of the other.
FILE_HEAT
Specifies the heat of the file based on the file access time and access size. For more information, see
/usr/lpp/mmfs/samples/ilm/README.
The calculation of FILE_HEAT depends partly on the value of the atime file attribute. The -S option of
the mmcrfs and mmchfs commands controls whether and when atime is updated. You can override this
setting temporarily with mount options that are specific to IBM Spectrum Scale. For more
information, see the topics mmchfs command and mmcrfs command in the IBM Spectrum Scale: Command
and Programming Reference and atime values in the IBM Spectrum Scale: Concepts, Planning, and
Installation Guide.
FILE_SIZE
Specifies the current size or length of the file, in bytes.
FILESET_NAME
Specifies the fileset where the path name for the files is located, or is to be created.
Note: Using the FOR FILESET clause has the same effect and is more efficient to evaluate.
GENERATION
Specifies a number that is incremented whenever an INODE number is reused.
⌂
MISC_ATTRIBUTES
Specifies various miscellaneous file attributes. The value is a string of characters that are defined as
follows:
+ File access is controlled by an Access Control List (ACL).
a The file is appendOnly.
A Archive.
c The file is selected to be compressed.
D Directory. To match all directories, you can use %D% as a wildcard character.
e Encrypted. A Microsoft Windows file attribute. Does not refer to IBM Spectrum Scale file
encryption.
E The file has extended-attribute metadata.
f Some data blocks of the file are ill-placed with respect to the File Placement Optimizer (FPO)
attributes of the file.
F Regular data file.
H Hidden. A Microsoft Windows file attribute.
i Not indexed by content. A Microsoft Windows file attribute.
I Some data blocks might be ill-placed.
j AFM append flag.
J Some data blocks might be ill-replicated.
Chapter 26. Information lifecycle management for IBM Spectrum Scale 381
k Remote attributes present. Internal to AFM.
K Some data blocks might be ill-compressed.
L Symbolic link.
m Empty directory.
M Co-managed.
2 Data blocks are replicated.
o Offline.
O Other (not F, D, nor L). For example, a device or named pipe.
p Reparse point. A Microsoft Windows file attribute.
P Active File Management (AFM) summary flag. Indicates that at least one specific AFM flag is set:
j, k, u, v, w, x, y, or z.
r Has streams. A Microsoft Windows file attribute.
R Read-only.
s Sparse. A Microsoft Windows file attribute.
S System. A Microsoft Windows file attribute.
t Temporary. A Microsoft Windows file attribute.
u File is cached. Internal to AFM.
U The file is trunc-managed.
v AFM create flag.
V Read-managed.
w AFM dirty data flag.
W Write-managed.
x AFM hard-linked flag.
X Immutability.
y AFM attribute-changed flag.
Y Indefinite retention.
z AFM local flag.
Z Secure deletion.
MODIFICATION_SNAPID
Specifies the integer ID of the snapshot after which the file was last changed. The value is normally
derived with the SNAP_ID() built-in function that assigns integer values to GPFS snapshot names.
This attribute allows policy rules to select files that are modified after a snapshot image is taken.
MODIFICATION_TIME
Specifies an SQL time stamp value for the date and time that the file data was last modified (POSIX
mtime).
NAME
Specifies the name of a file.
NLINK
Specifies the number of hard links to the file.
Note: Using the FROM POOL clause has the same effect and is often preferable.
SNAP_NAME
Specifies the snapshot name that the snapshot file is part of.
Note: This attribute has an effect only when it is used in snapshot placement rules.
RDEVICE_ID
Specifies the device type for a device.
USER_ID
Specifies the numeric user ID of the owner of the file. To return the value of USER_ID when
USER_NAME returns NULL, use COALESCE(USER_NAME, VARCHAR(USER_ID)) .
USER_NAME
Specifies the user name that is associated with USER_ID.
Notes:
1. When file attributes are referenced in initial placement rules, only the following attributes are valid:
CREATION_TIME,FILESET_NAME, GROUP_ID, MODE, NAME, SNAP_NAME, and USER_ID.
The placement rules, like all rules with a clause, might also reference the current date and current
time and use them to control matching.
2. When file attributes are used for restoring files, the attributes correspond to the attributes at the time
of the backup, not to the current restored file.
3. For SQL expressions, if you want to show any of these attribute fields as strings (for example,
FILE_HEAT), use SHOW('[FILE_HEAT]') rather than SHOW('FILE_HEAT'), as the latter is expanded.
4. All date attributes are evaluated in Coordinated Universal Time (a time standard abbreviated as UTC).
5.
Note: To test whether a file is encrypted by IBM Spectrum Scale, do one of the following actions:
v In a policy, use the following condition:
XATTR(’gpfs.Encryption’) IS NOT NULL
v On the command line, issue the following command:
mmlsattr -L FileName
With GPFS, you can use built-in functions in comparison predicates, between predicates, in predicates,
like predicates, mathematical value expressions, and boolean, string and numeric literals.
You can use these functions to support access to the extended attributes of a file, and to support
conversion of values to the supported SQL data types.
Chapter 26. Information lifecycle management for IBM Spectrum Scale 383
The following attribute functions can be used:
GetXattrs(pattern,prototype)
Returns extended attribute key=value pairs of a file for all extended attributes whose keys that match
pattern. The key=value pairs are returned in the format specified by prototype.
If the value specified for pattern is '*' or empty then all keys are matched.
The prototype is a character string representing the format of a typical key=value pair. The prototype
allows the user to specify which characters will be used to quote values, escape special code points,
separate the key and value, and separate each key=value pair.
Some examples of the prototype argument include:
key~n=value^n, # specify the escape characters
hexkey=hexvalue, # specify either or both as hexadecimal values
"key\n"="value\n", # specify quotes on either or both
key:"value^n"; # specify alternatives to = and ,
k:"v^n"; # allow key and value to be abbreviated
key, # specify keys only
"value~n"; # specify values only
key=’value~n’& # alternative quoting character
key=value # do not use a ’,’ separator; use space instead
You may omit the last or both arguments. The defaults are effectively
GetXattrs('*','key^n=hexvalue,').
The GetXattrs function returns an empty string for files that have no extended attributes with keys
that match pattern.
The GetXattrs function is supported by the mmapplypolicy command, but it might return NULL in
other contexts.
SetBGF(BlockGroupFactor)
Specifies how many file system blocks are laid out sequentially on disk to behave like a single large
block. This option only works if --allow-write-affinity is set for the data pool. This applies only to a
new data block layout; it does not migrate previously existing data blocks.
SetWAD(WriteAffinityDepth)
Specifies the allocation policy to be used. This option only works if --allow-write-affinity is set for
the data pool. This applies only to a new data block layout; it does not migrate previously existing
data blocks.
SetWADFG("WadfgValueString")
Indicates the range of nodes (in a shared nothing architecture) where replicas of blocks in the file are
to be written. You use this parameter to determine the layout of a file in the cluster so as to optimize
the typical access patterns of your applications. This applies only to a new data block layout; it does
not migrate previously existing data blocks.
"WadfgValueString" is a semicolon-separated string identifying one or more failure groups in the
following format:
FailureGroup1[;FailureGroup2[;FailureGroup3]]
where each FailureGroupx is a comma-separated string identifying the rack (or range of racks),
location (or range of locations), and node (or range of nodes) of the failure group in the following
format:
Rack1{:Rack2{:...{:Rackx}}},Location1{:Location2{:...{:Locationx}}},ExtLg1{:ExtLg2{:...{:ExtLgx}}}
For example, the following value
1,1,1:2;2,1,1:2;2,0,3:4
means that the first failure group is on rack 1, location 1, extLg 1 or 2; the second failure group is on
rack 2, location 1, extLg 1 or 2; and the third failure group is on rack 2, location 0, extLg 3 or 4.
Notes:
1. Only the end part of a failure group string can be left off. The missing end part may be the third
field only, or it may be both the second and third fields; however, if the third field is provided,
the second field must also be provided. The first field must always be provided. In other words,
every comma must both follow and precede a number; therefore, none of the following are valid:
2,0,
2,
,0,0
0,,0
,,0
2. Wildcard characters (*) are supported in these fields.
Here is an example of using setBGF, setWAD, and setWADFG:
RULE ’bgf’ SET POOL ’pool1’ WHERE NAME LIKE ’%’ AND setBGF(128) AND setWAD(1) AND setWADFG(1,0,1;2,0,1;3,0,1)
After installing this policy, a newly created file will have the same values for these three extended
attributes as it would if mmchattr were used to set them:
(06:29:11) hs22n42:/sncfs # mmlsattr -L test
file name: test
metadata replication: 3 max 3
data replication: 3 max 3
immutable: no
appendOnly: no
flags:
storage pool name: system
fileset name: root
snapshot name:
Block group factor: 128 -----------------gpfs.BGF
Write affinity depth: 1 -----------------gpfs.WAD
Write Affinity Depth Failure Group(FG) Map for copy:1 1,0,1 -----------------gpfs.WADFG
Write Affinity Depth Failure Group(FG) Map for copy:2 2,0,1
Write Affinity Depth Failure Group(FG) Map for copy:3 3,0,1
creation time: Sat Jun 8 06:28:50 2013
Misc attributes: ARCHIVE
SetXattr('ExtendedAttributeName', 'ExtendedAttributeValue')
This function sets the value of the specified extended attribute of a file.
Successful evaluation of SetXattr in a policy rule returns the value TRUE and sets the named
extended attribute to the specified value for the file that is the subject or object of the rule. This
function is effective for policy rules (like MIGRATE and LIST) that are evaluated by mmapplypolicy
and for the policy placement rule, SET POOL, when a data file is about to be created.
XATTR(extended-attribute-name [, start [, length]])
Returns the value of a substring of the extended attribute that is named by its argument as an SQL
VARCHAR value, where:
extended-attribute-name
Specifies any SQL expression that evaluates to a character string value. If the named extended
attribute does not exist, XATTR returns the special SQL value NULL.
Chapter 26. Information lifecycle management for IBM Spectrum Scale 385
Note: In SQL, the expression NULL || AnyValue yields NULL. In fact, with a few exceptions, the
special SQL value of NULL “propagates” throughout an SQL expression, to yield NULL. A
notable exception is that (expression) IS NULL always yields either TRUE or FALSE, never NULL.
For example, if you wish to display a string like _NULL_ when the value of the extended
attribute of a file is NULL you will need to code your policy rules file like this:
define(DISPLAY_NULL,[COALESCE($1,’_NULL_’)])
rule external list ’a’ exec ’’
rule list ’a’ SHOW(DISPLAY_NULL(xattr(’user.marc’)) || ’ and ’ || DISPLAY_NULL(xattr(’user.eric’)))
Here is an example execution, where either or both of the values of the two named extended
attributes may be NULL:
mmapplypolicy /gig/sill -P /ghome/makaplan/policies/display-null.policy -I test -L 2
...
WEIGHT(inf) LIST ’a’ /gg/sll/cc SHOW(_NULL_ and _NULL_)
WEIGHT(inf) LIST ’a’ /gg/sll/mm SHOW(yes-marc and _NULL_)
WEIGHT(inf) LIST ’a’ /gg/sll/bb SHOW(_NULL_ and yes-eric)
WEIGHT(inf) LIST ’a’ /gg/sll/tt SHOW(yes-marc and yes-eric)
Some extended attribute values represent numbers or timestamps as decimal or binary strings. Use the
TIMESTAMP, XATTR_FLOAT, or XATTR_INTEGER function to convert extended attributes to SQL
numeric or timestamp values:
XATTR_FLOAT(extended-attribute-name [, start [, length, [, conversion_option]]])
Returns the value of a substring of the extended attribute that is named by its argument, converted to
an SQL double floating-point value, where:
extended-attribute-name
Specifies any SQL expression that evaluates to a character string value. If the named extended
attribute does not exist, XATTR returns the special SQL value NULL.
start
Is the optional starting position within the extended attribute value. The default is 1.
length
Is the optional length, in bytes, of the extended attribute value to return. The default is the
number of bytes from the start to the end of the extended attribute string. You can specify length
as -1 to reach from the start to the end of the extended attribute string.
conversion_option
Specifies how the bytes are to be converted to a floating-point value. Supported options include:
v BIG_ENDIAN_DOUBLE or BD - a signed binary representation, IEEE floating, sign + 11 bit
exponent + fraction. This is the default when executing on a "big endian" host OS, such as AIX
on PowerPC®.
v BIG_ENDIAN_SINGLE or BS - IEEE floating, sign + 8-bit exponent + fraction.
v LITTLE_ENDIAN_DOUBLE or LD - bytewise reversed binary representation. This is the
default when executing on a "little endian" host OS, such as Linux on Intel x86.
386 IBM Spectrum Scale 5.0.2: Administration Guide
v LITTLE_ENDIAN_SINGLE or LS - bytewise-reversed binary representation.
v DECIMAL - the conventional SQL character string representation of a floating-point value.
Notes:
1. Any prefix of a conversion name can be specified instead of spelling out the whole name. The
first match against the list of supported options is used; for example, L matches
LITTLE_ENDIAN_DOUBLE.
2. If the extended attribute does not exist, the selected substring has a length of 0, or the selected
bytes cannot be converted to a floating-point value, the function returns the special SQL value
NULL.
XATTR_INTEGER(extended-attribute-name [, start [, length, [, conversion_option]]])
Returns the value of (a substring of) the extended attribute named by its argument, converted to a
SQL LARGEINT value, where.
extended-attribute-name
Specifies any SQL expression that evaluates to a character string value. If the named extended
attribute does not exist, XATTR returns the special SQL value NULL.
start
Is the optional starting position within the extended attribute value. The default is 1.
length
Is the optional length, in bytes, of the extended attribute value to return. The default is the
number of bytes from the start to the end of the extended attribute string. You can specify length
as -1 to reach from the start to the end of the extended attribute string.
conversion_option
Specifies how the bytes are to be converted to a LARGEINT value. Supported options include:
v BIG_ENDIAN - a signed binary representation, most significant byte first. This is the default
when executing on a "big endian" host OS, such as AIX on PowerPC.
v LITTLE_ENDIAN - bytewise reversed binary representation. This is the default when
executing on a "little endian" host OS, such as Linux on Intel x86.
v DECIMAL - the conventional SQL character string representation of an integer value.
Notes:
1. Any prefix of a conversion name can be specified instead of spelling out the whole name (B,
L, or D, for example).
2. If the extended attribute does not exist, the selected substring has a length of 0, or the selected
bytes cannot be converted to a LARGEINT value, the function returns the special SQL value
NULL. For example:
XATTR_INTEGER(’xyz.jim’,5,-1,’DECIMAL’)
String functions:
You can use these string-manipulation functions on file names and literal values.
Important tips:
1. You must enclose strings in single-quotation marks.
2. You can include a single-quotation mark in a string by using two single-quotation marks. For
example, 'a''b' represents the string a'b.
CHAR(expr[, length])
Returns a fixed-length character string representation of its expr argument, where:
expr
Can be any data type.
Chapter 26. Information lifecycle management for IBM Spectrum Scale 387
length
If present, must be a literal, integer value.
The resulting type is CHAR or VARCHAR, depending upon the particular function called.
The string that CHAR returns is padded with blanks to fill the length of the string. If length is not
specified, it defaults to a value that depends on the type of the argument (expr).
Note: The maximum length of a CHAR (fixed length string) value is 255 bytes. The result of
evaluating an SQL expression whose result is type CHAR may be truncated to this maximum length.
CONCAT(x,y)
Concatenates strings x and y.
HEX(x)
Converts an integer x into hexadecimal format.
LENGTH(x)
Determines the length of the data type of string x.
LOWER(x)
Converts string x into lowercase.
REGEX(String,'Pattern')
Returns TRUE if the pattern matches, FALSE if it does not. Pattern is a Posix extended regular
expression.
Note: The policy SQL parser normally performs M4 macro preprocessing with square brackets set as
the quote characters. Therefore, it is recommended that you add an extra set of square brackets
around your REGEX pattern string; for example:
...WHERE REGEX(name,[’^[a-z]*$’]) /* only accept lowercase alphabetic file names */
The following SQL expression:
NOT REGEX(STRING_VALUE,[’^[^z]*$|^[^y]*$|^[^x]*$|[abc]’])
can be used to test if STRING_VALUE contains all of the characters x, y, and z, in any order, and none
of the characters a, b, or c.
REGEXREPLACE(string,pattern,result-prototype-string)
Returns a character string as result-prototype-string with occurrences of \i (where i is 0 through 9)
replaced by the substrings of the original string that match the ith parenthesis delimited parts of the
pattern string. For example:
REGEXREPLACE(’speechless’,[’([^aeiou]*)([aeiou]*)(.*)’],[’last=\3. middle=\2. first=\1.’])
When pattern does not match string, REGEXREPLACE returns the value NULL.
When a \0 is specified in the result-prototype-string, it is replaced by the substring of string that
matches the entire pattern.
SUBSTR(x,y,z)
Extracts a portion of string x, starting at position y, optionally for z characters (otherwise to the end
of the string). This is the short form of SUBSTRING. If y is a negative number, the starting position
is counted from the end of the string; for example, SUBSTR('ABCDEFGH',-3,2) == 'FG'.
Note: Do not confuse SUBSTR with substr. substr is an m4 built-in macro function.
The resulting type is CHAR or VARCHAR, depending upon the particular function called. Unlike
CHAR, the string that the VARCHAR function returns is not padded with blanks.
Note: The maximum length of a VARCHAR(variable length string) value is 8192 bytes. The result of
evaluating an SQL expression whose result is type VARCHAR may be truncated to this maximum
length.
Numerical functions:
You can use numeric-calculation functions to place files based on either numeric parts of the file name,
numeric parts of the current date, or UNIX-client user IDs or group IDs.
These functions can be used in combination with comparison predicates and mathematical infix operators
(such as addition, subtraction, multiplication, division, modulo division, and exponentiation).
INT(x)
Converts number x to a whole number, rounding up fractions of .5 or greater.
INTEGER(x)
Converts number x to a whole number, rounding up fractions of .5 or greater.
MOD(x,y)
Determines the value of x taken modulo y (x % y).
You can use these date-manipulation and time-manipulation functions to place files based on when the
files are created and the local time of the GPFS node serving the directory where the file is being created.
CURRENT_DATE
Determines the current date on the GPFS server.
CURRENT_TIMESTAMP
Determines the current date and time on the GPFS server.
DAYOFWEEK(x)
Determines the day of the week from date or timestamp x. The day of a week is from 1 to 7 (Sunday
is 1).
DAYOFYEAR(x)
Determines the day of the year from date x. The day of a year is a number from 1 to 366.
DAY(x)
Determines the day of the month from date or timestamp x.
Chapter 26. Information lifecycle management for IBM Spectrum Scale 389
DAYS(x)
Determines the number of days between date or timestamp x and 0001-01-01.
DAYSINMONTH(x)
Determines the number of days in the month of date x.
DAYSINYEAR(x)
Determines the day of the year of date x.
HOUR(x)
Determines the hour of the day (a value from 0 to 23) of timestamp x.
MINUTE(x)
Determines the minute from timestamp x.
MONTH(x)
Determines the month of the year from date or timestamp x.
QUARTER(x)
Determines the quarter of year from date x. Quarter values are the numbers 1 through 4. For
example, January, February, and March are in quarter 1.
SECOND(x)
Returns the seconds portion of timestamp x.
TIMESTAMP(sql-numeric-value) or TIMESTAMP(sql-character-string-value)
Accepts any numeric value. The numeric value is interpreted as the number of seconds since January
1, 1970 (the standard UNIX epoch) and is converted to an SQL TIMESTAMP value.
Signed 64-bit LARGEINT argument values are supported. Negative argument values cause
TIMESTAMP to convert these values to timestamps that represent years before the UNIX epoch.
This function also accepts character strings of the form YYYY-MM-DD HH:MM:SS. A hyphen (-) or an
at sign (@) might appear instead of the blank between the date and the time. The time can be
omitted. An omitted time defaults to 00:00:00. The :SS field can be omitted, which defaults to 00.
WEEK(x)
Determines the week of the year from date x.
YEAR(x)
Determines the year from date or timestamp x.
Any given file is a potential candidate for at most one MIGRATE or DELETE operation during each
invocation of the mmapplypolicy command. A single invocation of the mmapplypolicy command is
called the job.
The mmapplypolicy command sets the SQL built-in variable CURRENT_TIMESTAMP, and collects pool
occupancy statistics at the beginning of the job.
| Tip: The mmapplypolicy command always does Phase one, even if no file data has changed and even if
| the purpose for running the command is only to rewrap encryption keys. This process can take a long
| time and can involve considerable system resources if the affected file system or fileset is very large. You
| might want to delay running mmapplypolicy until a time when the system is not running a heavy load of
| applications.
Note: mmapplypolicy reads directly from the metadata disk blocks and can therefore lag behind the
POSIX state of the file system. To be sure that MODIFICATION_TIME and the other timestamps are
completely up to date, you can use the following suspend-and-resume sequence to force recent changes
to disk:
mmfsctl fs-name suspend; mmfsctl fs-name resume;
For each file, the policy rules are considered, in order, from first rule to last:
v If the rule has a WHEN clause that evaluates to FALSE, the rule is skipped.
v If the rule has a FROM POOL clause, and the named pool does not match the POOL_NAME attribute
of the file, the rule is skipped. A FROM POOL clause that specifies a group pool name matches a file
if any pool name within the group pool matches the POOL_NAME attribute of the file.
v If there is a THRESHOLD clause and the current pool of the file has an occupancy percentage that is
less than the HighPercentage parameter of the THRESHOLD clause, the rule is skipped.
v If the rule has a FOR FILESET clause, but none of the named filesets match the FILESET_NAME
attribute of the file, the rule is skipped.
Chapter 26. Information lifecycle management for IBM Spectrum Scale 391
v If the rule has a WHERE clause that evaluates to FALSE, the rule is skipped. Otherwise, the rule
applies.
v If the applicable rule is a LIST ’listname-y’ rule, the file becomes a candidate for inclusion in the named
list unless the EXCLUDE keyword is present, in which case the file will not be a candidate; nor will
any following LIST ’listname-y’ rules be considered for the subject file. However, the file is subject to
LIST rules naming other list names.
v If the applicable rule is an EXCLUDE rule, the file will be neither migrated nor deleted. Files matching
the EXCLUDE rule are not candidates for any MIGRATE or DELETE rule.
Note: Specify the EXCLUDE rule before any other rules that might match the files that are being
excluded. For example:
RULE ’Exclude root’s file’ EXCLUDE where USER_ID = 0
RULE ’Migrate all but root’s files’ MIGRATE TO POOL ’pool1’
will migrate all the files that are not owned by root. If the MIGRATE rule was placed in the policy file
before the EXCLUDE rule, all files would be migrated because the policy engine would evaluate the
rules from first to last, and root's files would have to match the MIGRATE rule.
To exclude files from matching a LIST rule, you must create a separate LIST rule with the EXCLUDE
clause and place it before the LIST rule.
v If the applicable rule is a MIGRATE rule, the file becomes a candidate for migration to the pool
specified by the TO POOL clause.
When a group pool is the TO POOL target of a MIGRATE rule, the selected files are distributed
among the disk pools comprising the group pool, with files of highest weight going to the most
preferred disk pool up to the occupancy limit for that pool. If there are still more files to be migrated,
those go to the second most-preferred pool up to the occupancy limit for that pool (again choosing the
highest-weight files from among the remaining selected files); and so on for the subsequent
most-preferred pools, until either all selected files have been migrated or until all the disk pools of the
group pool have been filled to their respective limits.
v If the applicable rule is a DELETE rule, the file becomes a candidate for deletion.
v If there is no applicable rule, the file is not a candidate for migration or deletion.
v Each candidate file (for migration or deletion) is also associated with a LowPercentage occupancy
percentage value, which is taken from the THRESHOLD clause of the applicable rule. If not specified,
the LowPercentage value defaults to 0%.
v Each candidate file is also associated with a numeric weight, either computed from the WeightExpression
of the applicable rule, or assigned a default using these rules:
– If a LowPercentage is specified within a THRESHOLD clause of the applicable rule, the weight of the
candidate is taken as the KB_ALLOCATED attribute of the candidate file.
– If a LowPercentage is not specified within a THRESHOLD clause of the applicable rule, the weight of
the candidate is taken as +infinity.
Chosen files are scheduled for migration or deletion, taking into account the weights and thresholds
determined in “Phase one: Selecting candidate files” on page 391, as well as the actual pool occupancy
percentages. Generally, candidates with higher weights are chosen ahead of those with lower weights.
File migrations to and from external pools are done before migrations and deletions that involve only
GPFS disk pools.
File migrations that do not target group pools are done before file migrations to group pools.
The following two options can be used to adjust the method by which candidates are chosen:
--choice-algorithm {best | exact | fast}
Specifies one of the following types of algorithms that the policy engine is to use when selecting
candidate files:
best
Chooses the optimal method based on the rest of the input parameters.
exact
Sorts all of the candidate files completely by weight, then serially considers each file from highest
weight to lowest weight, choosing feasible candidates for migration, deletion, or listing according
to any applicable rule LIMITs and current storage-pool occupancy. This is the default.
fast
Works together with the parallelized -g /shared-tmp -N node-list selection method. The fast
choice method does not completely sort the candidates by weight. It uses a combination of
statistical, heuristic, and parallel computing methods to favor higher weight candidate files over
those of lower weight, but the set of chosen candidates may be somewhat different than those of
the exact method, and the order in which the candidates are migrated, deleted, or listed is
somewhat more random. The fast method uses statistics gathered during the policy evaluation
phase. The fast choice method is especially fast when the collected statistics indicate that either
all or none of the candidates are feasible.
--split-margin n.n
A floating-point number that specifies the percentage within which the fast-choice algorithm is
allowed to deviate from the LIMIT and THRESHOLD targets specified by the policy rules. For
example if you specified a THRESHOLD number of 80% and a split-margin value of 0.2, the
fast-choice algorithm could finish choosing files when it reached 80.2%, or it might choose files that
bring the occupancy down to 79.8%. A nonzero value for split-margin can greatly accelerate the
execution of the fast-choice algorithm when there are many small files. The default is 0.2.
When scheduling files, mmapplypolicy simply groups together either the next 100 files by default, or the
number of files explicitly set using the -B option.
However, you can set up mmapplypolicy to schedule files so that each invocation of the InterfaceScript
gets approximately the same amount of file data to process. To do so, use the SIZE clause of certain
policy rules to specify that scheduling be based on the sum of the sizes of the files. The SIZE clause can
be applied to the following rules (for details, see “Policy rules” on page 371):
v DELETE
v EXTERNAL LIST
v EXTERNAL POOL
v LIST
v MIGRATE
In addition to using the SIZE clause to control the amount of work passed to each invocation of a
InterfaceScript, you can also specify that files with similar attributes be grouped or aggregated together
Chapter 26. Information lifecycle management for IBM Spectrum Scale 393
during the scheduling phase. To do so, use an aggregator program to take a list of chosen candidate files,
sort them according to certain attributes, and produce a reordered file list that can be passed as input to
the user script.
Note: You can also use the -q option to specify that small groups of files are to be taken in
round-robin fashion from the input file lists (for example, take a small group of files from x.list.A,
then from x.list.B, then from x.list.C, then back to x.list.A, and so on, until all of the files have been
processed).
To prevent mmapplypolicy from redistributing the grouped files according to size, omit the SIZE
clause from the appropriate policy rules and set the bunching parameter of the -B option to a very
large value.
Generally, a candidate is not chosen for deletion from a pool, nor migration out of a pool, when the pool
occupancy percentage falls below the LowPercentage value. Also, candidate files will not be chosen for
migration into a target TO POOL when the target pool reaches the occupancy percentage specified by the
LIMIT clause (or 99% if no LIMIT was explicitly specified by the applicable rule).
The limit clause does not apply when the target TO POOL is a group pool; the limits specified in the
rule defining the target group pool govern the action of the MIGRATE rule. The policy-interpreting
program (for example, mmapplypolicy) may issue a warning if a LIMIT clause appears in a rule whose
target pool is a group pool.
For migrations, if the applicable rule had a REPLICATE clause, the replication factors are also adjusted
accordingly. It is acceptable for the effective FROM POOL and TO POOL to be the same because the
mmapplypolicy command can be used to adjust the replication factors of files without necessarily
moving them from one pool to another.
The migration performed in the third phase can involve large amounts of data movement. Therefore, you
may want to consider using the –I defer option of the mmapplypolicy command, and then perform the
data movements with the mmrestripefs -p command.
It is a good idea to test your policy rules by running the mmapplypolicy command with the -I test
option and the -L 3 or higher option. Testing helps you understand which files are selected as candidates
and which candidates are chosen to be processed.
The output shows which files are scanned and which match rules or no rules. If a problem is not
apparent, you can add a SHOW() clause to your rule to see the values of file attributes or SQL
expressions. To see multiple values, enter a command like the following one:
SHOW(’x1=’ || varchar(Expression1) || ’ x2=’ || varchar(Expression2) || ... )
where ExpressionX is the SQL variable or expression of function that you suspect or do not understand.
Beware that if any expression evaluates to SQL NULL, then the entire show clause is NULL, by the rules
of SQL. One way to show null vs. non-null values is to define a macro and call it as in the following
example:
define(DISPLAY_NULL,[CASE WHEN ($1) IS NULL THEN ’_NULL_’ ELSE varchar($1) END])
Note: For examples and more information on the -L flag, see the topic The mmapplypolicy -L command in
the IBM Spectrum Scale: Problem Determination Guide.
Chapter 26. Information lifecycle management for IBM Spectrum Scale 395
CURRENT_DATE is an SQL built in operand that returns the date portion of the
CURRENT_TIMESTAMP value.
7. Use the SQL IN operator to test several possibilities:
RULE ’D_WEEKEND’ WHEN (DayOfWeek(CURRENT_DATE) IN (7,1)) /* Saturday or Sunday */
DELETE WHERE PATH_NAME LIKE ’%/tmp/%’
For information on how to use a macro processor such as m4 to make reading and writing policy
rules easier, see “Using macro processing utilities with policy rules” on page 398.
8. Use a FILESET clause to restrict the rule to files within particular filesets:
RULE ’fsrule1’ MIGRATE TO POOL ’pool_2’
FOR FILESET(’root’,’fset1’)
In this example there is no FROM POOL clause, so regardless of their current storage pool
placement, all files from the named filesets are subject to migration to storage pool pool_2.
Note: To have the migrate rule applied to snapshot files, you must specify the mmapplypolicy fs -S
snap1 option, where snap1 is the name of the snapshot where the files reside.
9. Use an EXCLUDE rule to exclude a set of files from all subsequent rules:
RULE ’Xsuper’ EXCLUDE WHERE USER_ID=0
RULE ’mpg’ DELETE WHERE lower(NAME) LIKE ’%.mpg’ AND FILE_SIZE>20123456
Notes:
a. Specify the EXCLUDE rule before rules that might match the files that are being excluded.
b. You cannot define a list and what to exclude from the list in a single rule. You must define two
LIST statements, one specifying which files are in the list and one specifying what to exclude
from the list. For example, to exclude files that contain the word test from the LIST rule allfiles,
define the following rules:
RULE EXTERNAL LIST ’allfiles’ EXEC ’/u/brownap/policy/CHE/exec.list’
where:
access_age is DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)
file_size is FILE_SIZE or KB_ALLOCATED
X and Y are weight factors that are chosen by the system administrator.
15. The WEIGHT clause can be used to express ideas like this (stated informally):
IF access_age > 365 days THEN weight = 100000 + access_age
ELSE IF access_age < 30 days THEN weight = 0
ELSE weight= KB_ALLOCATED
This rule means:
v Give a very large weight bias to any file older than a year.
v Force the weight of any file younger than 30 days to 0.
v Assign weights to all other files according to the number of kilobytes occupied.
The following code block shows the formal SQL syntax:
CASE
WHEN DAYS(CURRENT_TIMESTAMP) – DAYS(ACCESS_TIME) > 365
THEN 100000 + DAYS(CURRENT_TIMESTAMP) – DAYS(ACCESS_TIME)
WHEN DAYS(CURRENT_TIMESTAMP) – DAYS(ACCESS_TIME) < 30
THEN 0
ELSE
KB_ALLOCATED
END
16. The SHOW clause has no effect in matching files but can be used to define additional attributes to
be exported with the candidate file lists. It can be used for any purpose but is primarily used to
support file aggregation.
To support aggregation, you can use the SHOW clause to output an aggregation value for each file
that is selected by a rule. You can then output those values to a file list and input that list to an
external program that groups the files into aggregates.
Chapter 26. Information lifecycle management for IBM Spectrum Scale 397
17. If you have a large number of filesets against which to test, use the FILESET_NAME variable as
shown in the following example:
RULE ’x’ SET POOL ’gold’ WHERE FILESET_NAME LIKE ’xyz.%.xyz’
However, if you are testing against just a few filesets, you can use the FOR FILESET('xyz1', 'xyz2')
form instead.
18. You can convert a time interval value to a number of seconds with the SQL cast syntax, as in the
following example:
define([toSeconds],[(($1) SECONDS(12,6))])
define([toUnixSeconds],[toSeconds($1 - ’1970-1-1@0:00’)])
To implement this policy, enter the following commands. The third line converts the time stamp to
UTC format.
LC=’2017-02-21 04:56 IST’
echo $LC
LCU=$(date +%Y-%m-%d" "%H:%M -d "$LC" -u)
echo $LCU
mmapplypolicy gpfs0 -P policy -I defer -f /tmp -M LAST_CREATE="$LCU"
This processing allows you to incorporate into the policy file some of the traditional m4 facilities and to
define simple and parameterized macros, conditionally include text, perform conditional evaluation,
perform simple string operations, perform simple integer arithmetic and much more.
Note: GPFS uses the m4 built-in changequote macro to change the quote pair to [ ] and the changecom
macro to change the comment pair to /* */ (as in the C programming language).
Utilizing m4 as a front-end processor simplifies writing policies and produces policies that are easier to
understand and maintain. Here is Example 15 on page 397 from “Policy rules: Examples and tips” on
page 394 written with a few m4 style macro definitions:
define(weight_expression,
CASE
WHEN access_age > 365
THEN 100000 + access_age
WHEN access_age < 30
THEN 0
ELSE
KB_ALLOCATED
END
)
If you would like to use megabytes or gigabytes instead of kilobytes to represent file sizes, and
SUNDAY, MONDAY, and so forth instead of 1, 2, and so forth to represent the days of the week, you
can use macros and rules like this:
define(MB_ALLOCATED,(KB_ALLOCATED/1024.0))
define(GB_ALLOCATED,(KB_ALLOCATED/1048576.0))
define(SATURDAY,7)
define(SUNDAY,1)
define(MONDAY,2)
define(DAY_OF_WEEK, DayOfWeek(CURRENT_DATE))
The mmapplypolicy command provides a -M option that can be used to specify m4 macro definitions
when the command is invoked. The policy rules may include variable identifiers whose values can be set
using one or more -M options on the mmapplypolicy command. The policy rules could then compare file
attributes to the currently provided values for the macro defined variables.
Among other things, this allows you to create a single policy file and reuse it for incremental backups
without editing the file for each backup. For example, if your policy file contains the rules:
RULE EXTERNAL POOL ’archive’ EXEC ’/opts/hpss/archiveScript’ OPTS ’-server archive_server’
RULE ’mig1’ MIGRATE TO POOL ’dead’ WHERE ACCESS_TIME < TIMESTAMP(deadline)
RULE ’bak1’ MIGRATE TO POOL ’archive’ WHERE MODIFICATION_SNAPID > last_snapid
The "mig1" rule will migrate old files that were not accessed since 2006/11/30 to an online pool named
"dead". The "bak1" rule will migrate files that have changed since the 2006_DEC snapshot to an external
pool named "archive". When the external script /opts/hpss/archiveScript is invoked, its arguments will
include "-server archive.abc.com".
Managing policies
Policies and the rules that they contain are used to assign files to specific storage pools.
Chapter 26. Information lifecycle management for IBM Spectrum Scale 399
A storage pool typically contains a set of volumes that provide a specific quality of service for a specific
use, such as to store all files for a particular application or a specific business division.
Creating a policy
Create a text file for your policy by following these guidelines.
v A policy must contain at least one rule.
v A policy file is limited to a size of 1 MB.
v When a file placement policy is applied to a file, the policy engine scans the list of rules in the policy
in order, starting at the top, to determine which rule applies to the file. When the policy engine finds a
rule that applies to the file, it stops processing the rules and assigns the file to the appropriate storage
pool. If no rule applies, the policy engine returns an EINVAL error code to the application.
Note: The last placement rule of a policy rule list should be in the following form so that the file is
assigned to a default pool if no other placement rule applies:
RULE 'DEFAULT' SET POOL 'default-data-pool'
For file systems that are upgraded to V4.1.1 or later: If there are no SET POOL policy rules installed
to a file system by mmchpolicy, the system acts as if the single rule SET POOL 'first-data-pool' is in
effect, where first-data-pool is the firstmost non-system pool that is available for file data storage, if such
a non-system pool is available. (“Firstmost” is the first according to an internal index of all pools.)
However, if there are no policy rules installed and there is no non-system pool, the system acts as if
SET POOL 'system' is in effect.
For file systems that are upgraded to V4.1.1: Until a file system is upgraded, if no SET POOL rules
are present (set by mmchpolicy) for the file system, all data is stored in the 'system' pool.
v Comments within a policy must start with a /* and end with a */:
/* This is a comment */
For more information, see the topic “Policy rules” on page 371.
Installing a policy
Install a policy by following these guidelines.
To install a policy:
1. Create a text file containing the desired policy rules.
2. Issue the mmchpolicy command.
Tiered storage solutions can avoid NO_SPACE events by monitoring file system space usage and
migrating data to other storage pools when the system exceeds a specified threshold. Policies can be used
A NO_SPACE event is generated if the file system is out of space. A lowDiskSpace event requires a
threshold. If a threshold is specified, a lowDiskSpace event is generated.
GPFS provides user exits for NO_SPACE and lowDiskSpace events. Using the mmaddcallback
command, you can specify a script that runs when either of these events occurs. For more information,
see the topic mmaddcallback command in the IBM Spectrum Scale: Command and Programming Reference.
The file with the policy rules used by mmapplypolicy is the one that is currently installed in the file
system. It is a good idea for the HSM user to define migration or deletion rules to reduce the usage in
each online storage pool. Migration rules that are defined with a high and low THRESHOLD establish
the threshold that is used to signal the lowDiskSpace event for that pool. Because more than one
migration rule can be defined, the threshold for a pool is the minimum of the high thresholds set by the
rules for that pool. Each pool has its own threshold. Pools without migration rules do not signal a
lowDiskSpace event.
In order to enable the low space events required for the policy to work, the enableLowspaceEvents
global parameter must be set to 'yes'. To view the current status of this setting, run mmlsconfig and
mmchconfig enableLowspaceEvents=yes to set it if necessary.
Note: GPFS must be restarted on all nodes in order for this setting to take effect.
A callback must be added in order to trigger the policy run when the low space event is generated. A
simple way to add the callback is using the mmstartpolicy command.
To add a callback, run this command. The following command is on one line:
mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace
--parms "%eventName %fsName --single-instance"
The --single-instance flag is required to avoid running multiple migrations on the file system at the same
time.
Policy changes take effect immediately on all nodes that have the affected file system mounted. For nodes
that do not have the file system mounted, policy changes take effect upon the next mount of the file
system.
Listing policies
When you use the mmlspolicy command to list policies, follow these guidelines.
The mmlspolicy command displays policy information for a given file system. The information displayed
is:
v When the policy file was installed.
v The user who installed the policy file.
Chapter 26. Information lifecycle management for IBM Spectrum Scale 401
v The first line of the original policy file.
The mmlspolicy -L command returns the installed (original) policy file. This shows all the rules and
comments as they were in the policy file when it was installed. This is useful if you want to change
policy rules - simply retrieve the original policy file using the mmlspolicy -L command and edit it.
Validating policies
When you validate a policy file, follow this guideline.
The mmchpolicy -I test command validates but does not install a policy file.
Deleting policies
When you remove the current policy rules and restore the file-placement policy, follow this guideline.
To remove the current policy rules and restore the default GPFS file-placement policy, specify DEFAULT
as the name of the policy file on the mmchpolicy command. This is equivalent to installing a policy file
with just one rule:
RULE ’DEFAULT’ SET POOL ’system’
One possible way to improve the performance of the mmapplypolicy command is to specify an
alternative sort command to be used instead of the default sort command provided by the operating
system. To do this, issue mmapplypolicy --sort-command SortCommand, specifying the executable path of
the alternative command.
For example, on AIX the GNU sort program, freely available within the coreutils package from AIX
Toolbox for Linux Applications (www.ibm.com/systems/power/software/aix/linux/toolbox), will
typically perform large sorting tasks much faster than the standard AIX sort command. If you wanted to
specify the GNU sort program, you would use the following command: mmapplypolicy --sort-command
/opt/freeware/bin/sort.
If you specify an alternative sort command, it is recommended that you install it on all cluster nodes.
The following topics describe how to work with external storage pools:
v Defining the external pools
v “User-provided program for managing external pools” on page 404
v “File list format” on page 404
v “Record format” on page 405
v “Migrate and recall with external pools” on page 406
v “Pre-migrating files with external storage pools” on page 407
v “Purging files from external storage pools” on page 407
v “Using thresholds to migrate data between pools” on page 400
GPFS file management policy rules control data migration into external storage pools. Before you can
write a migration policy you must define the external storage pool that the policy will reference. After
you define the storage pool, you can then create policies that set thresholds that trigger data migration
into or out of the referenced external pool.
When a storage pool reaches the defined threshold or when you invoke mmapplypolicy, GPFS processes
the metadata, generates a list of files, and invokes a user provided script or program which initiates the
appropriate commands for the external data management application to process the files. This allows
GPFS to transparently control offline storage and provide a tiered storage solution that includes tape or
other media.
Before you can migrate data to an external storage pool, you must define that pool. To define external
storage pools, use a GPFS policy rule as follows:
RULE EXTERNAL POOL ’PoolName’ EXEC ’InterfaceScript’ [OPTS ’OptionsString’] [ESCAPE ’SpecialCharacters’]
Where:
v PoolName defines the name of the storage pool
v InterfaceScript defines the program or script to be invoked to migrate data to or from the external pool
v OptionsString is an optional string that, if provided, will be passed to the InterfaceScript
Chapter 26. Information lifecycle management for IBM Spectrum Scale 403
You must have a separate EXTERNAL POOL rule for each external pool that you wish to define.
In this example:
v externalpoolA is the name of the external pool
v /usr/hsm/bin/hsmControl is the location of the executable script that will be invoked when there are
files for migration
v -server=hsm-manager.nyc.com is the location of storage pool externalpoolA
For additional information, refer to “User-provided program for managing external pools.”
When the mmapplypolicy command is invoked and a rule dictates that data should be moved to or from
an external pool, the user provided program identified with the EXEC clause in the policy rule launches.
That executable program receives three arguments:
v The command to be executed. Your script should implement each of the following sub-commands:
– LIST - Provides arbitrary lists of files with no semantics on the operation.
– MIGRATE - Migrate files to external storage and reclaim the online space allocated to the file.
– PREMIGRATE - Migrate files to external storage but do not reclaim the online space.
– PURGE - Delete files from both the online file system and the external storage.
– RECALL - Recall files from external storage to the online storage.
– TEST – Test for presence and operation readiness. Return zero for success. Return non-zero if the
script should not be used on a given node.
v The name of a file containing a list of files to be migrated, premigrated, or purged. See “File list
format” for detailed description of the layout of the file.
v Any optional parameters specified with the OPTS clause in the rule. These optional parameters are not
interpreted by the GPFS policy engine.
The mmapplypolicy command invokes the external pool script on all nodes in the cluster that have
installed the script in its designated location. The script must be installed at the node that runs
mmapplypolicy. You can also install the script at other nodes for parallel operation but that is not
required. GPFS may call your exit script one or more times for each command.
Important: Use the EXCLUDE rule to exclude any special files that are created by an external
application. For example, when using IBM Spectrum Protect or Hierarchical Storage Management (HSM),
exclude the .SpaceMan directory to avoid migration of .SpaceMan, which is an HSM repository.
where:
v InodeNumber is a 64-bit inode number.
404 IBM Spectrum Scale 5.0.2: Administration Guide
v GenNumber is a 32-bit file generation number.
v SnapId is a 64-bit snapshot identifier.
v OptionalShowArgs is the result, if any, from the evaluation of the SHOW clause in the policy rule.
v FullPathToFile is a fully qualified path name to the file. When there are multiple paths within a file
system to a particular file (Inode, GenNumber, and SnapId), each path is shown.
v The "--" characters are a field delimiter that separates the optional show parameters from the path
name to the file.
Note: GPFS does not restrict the character set used for path and file names. All characters except '\0' are
valid. To make the files readily parseable, files or directories containing the newline character and/or
other special characters are “escaped”, as described previously, in connection with the ESCAPE
’%special-characters’ clause.
Record format
The format of the records in each file list file can be expressed as shown in the following example.
where:
v iAggregate is a grouping index that is assigned by mmapplypolicy.
v WEIGHT represents the WEIGHT policy language file attribute.
v INODE represents the INODE policy language file attribute.
v GENERATION represents the GENERATION policy language file attribute.
v SIZE represents the SIZE policy language file attribute.
v iRule is a rule index number assigned by mmapplypolicy, which relates to the policy rules file that is
supplied with the -P argument.
v resourceID represents a pool index, USER_ID, GROUP_ID, or fileset identifier, depending on whether
thresholding is done with respect to pool usage or to user, group, or fileset quotas.
v attr_flags represents a hexadecimal encoding of some of the attributes that are also encoded by the
policy language variable MISC_ATTRIBUTES. The low-order 20 bits of attr_flags are taken from the
ia_flags word that is defined in the gpfs.h API definition.
v path-length represents the length of the character string PATH_NAME.
v pool-length represents the length of the character string POOL_NAME.
v show-length represents the length of the character string SHOW.
v end-of-record-character is \n or \0.
Note: You can only change the values of the iAggregate, WEIGHT, SIZE, and attr_flags fields. Changing
the values of other fields can cause unpredictable policy execution results.
All of the numeric fields are represented as hexadecimal strings, except the path-length, pool-length, and
show-length fields, which are decimal encoded. These fields can be preceded by a minus sign ( - ), which
indicates that the string that follows it contains escape sequences. In this case, the string might contain
occurrences of the character pair \n, which represents a single newline character with a hexadecimal
value of 0xA in the filename. Also, the string might contain occurrences of the character pair \\, which
represents a single \ character in the filename. A \ will only be represented by \\ if there are also
newline characters in the filename. The value of the length field within the record counts any escape
characters.
Chapter 26. Information lifecycle management for IBM Spectrum Scale 405
The encoding of WEIGHT is based on the 64-bit IEEE floating format, but its bits are flipped so that when
a file list is sorted using a conventional collating sequence, the files appear in decreasing order, according
to their WEIGHT.
The encoding of WEIGHT can be expressed and printed using C++ as:
double w = - WEIGHT;
/* This code works correctly on big-endian and little-endian systems */
uint64 u = *(uint64*)&w; /* u is a 64 bit long unsigned integer
containing the IEEE 64 bit encoding of the double floating point
value of variable w */
uint64 hibit64 = ((uint64)1<<63);
if (w < 0.0) u = ~u; /* flip all bits */
else u = u | hibit64; /* force the high bit from 0 to 1,
also handles both “negatively” and “positively” signed 0.0 */
printf(“%016llx”,u);
The format of the majority of each record can be expressed in C++ as:
printf("%03x:%016llx:%016llx:%llx:%llx:%x:%x:%llx:%d!%s:%d!%s",
iAggregate, u /*encoding of –1*WEIGHT from above*/, INODE, ... );
Notice that the first three fields are fixed in length to facilitate the sorting of the records by the field
values iAggregate, WEIGHT, and INODE.
The format of the optional SHOW string portion of the record can be expressed as:
if(SHOW && SHOW[0]) printf(“;%d!%s",strlen(SHOW),SHOW);
For more information, see the topic mmapplypolicy command in the IBM Spectrum Scale: Command and
Programming Reference.
When you invoke mmapplypolicy and a rule dictates that data should be deleted or moved to or from
an external pool, the program identified in the EXTERNAL POOL rule is invoked with the following
arguments:
v The command to be executed.
v The name of the file containing a list of files to be migrated, pre-migrated, or purged.
v Optional parameters, if any.
To move files from the internal system pool to storage pool "externalpoolA" you would simply define a
migration rule that may look something like this:
RULE ’MigToExt’ MIGRATE FROM POOL(’system’) TO POOL(’externalpoolA’) WHERE ...
This would result in the external pool script being invoked as follows:
/usr/hsm/bin/hsmControl MIGRATE /tmp/filelist -server=hsm-manager.nyc.com
Similarly, a rule to migrate data from an external pool back to an internal storage pool could look like:
RULE ’MigFromExt’ MIGRATE FROM POOL ’externalpoolA’ TO POOL ’system’ WHERE ...
This would result in the external pool script being invoked as follows:
/usr/hsm/bin/hsmControl RECALL /tmp/filelist -server=hsm-manager.nyc.com
Pre-migration copies data from GPFS internal storage pools to external pools but leaves the original data
online in the active file system. Pre-migrated files are often referred to as "dual resident" to indicate that
the data for the files are available both online in GPFS and offline in the external storage manager. Files
in the pre-migrated state allow the external storage manager to respond more quickly to low space
conditions by simply deleting the copy of the file data that is stored online.
The files to be pre-migrated are determined by the policy rules that migrate data to an external storage
pool. The rule will select files to be migrated and optionally select additional files to be pre-migrated. The
THRESHOLD clause of the rule determines the files that need to be pre-migrated.
If you specify the THRESHOLD clause in file migration rules, the mmapplypolicy command selects files
for migration when the affected storage pool reaches the specified high occupancy percentage threshold.
Files are migrated until the storage pool utilization is reduced to the specified low occupancy percentage
threshold. When migrating to an external storage pool, GPFS allows you to specify a third pool
occupancy percentage which defines the file pre-migration threshold: after the low occupancy percentage
is reached, files are pre-migrated until the pre-migration occupancy percentage is reached.
To explain thresholds in another way, think of an internal storage pool with a high threshold of 90%, a
low threshold of 80%, and a pre-migrate threshold of 60%. When this internal storage pool reaches 90%
occupancy, the policy rule will migrate files until the occupancy of the pool reaches 80% then it will
continue to pre-migrate another 20% of the file space until the 60% threshold is reached.
Pre-migration can only be done with external storage managers using the XDSM Data Storage
Management API (DMAPI). Files in the migrated and pre-migrated state will have a DMAPI managed
region set on the file data. Files with a managed region are visible to mmapplypolicy and may be
referenced by a policy rule. You can approximate the amount of pre-migrated space required by counting
the space used after the end of the first full data block on all files with managed regions.
Note:
1. If you do not set a pre-migrate threshold or if you set a value that is greater than or equal to the low
threshold, then GPFS will not pre-migrate files. This is the default setting.
2. If you set the pre-migrate threshold to zero, then GPFS will pre-migrate all files.
If the file has been migrated or pre-migrated, this would result in the external pool script being invoked
as follows:
/usr/hsm/bin/hsmControl PURGE /tmp/filelist -server=hsm-manager.nyc.com
Chapter 26. Information lifecycle management for IBM Spectrum Scale 407
The script should delete a file from both the online file system and the external storage manager.
However, most HSM systems automatically delete a file from the external storage manager whenever the
online file is deleted. If that is how your HSM system functions, your script will only have to delete the
online file.
You can use the GPFS ILM tools to backup data for disaster recovery or data archival to an external
storage manager such as the IBM Spectrum Protect Backup-Archive client. When backing up data, the
external storage manager must preserve the file name, attributes, extended attributes, and the file data.
Among other things, the extended attributes of the file also contain information about the assigned
storage pool for the file. When you restore the file, this information is used to assign the storage pool for
the file data.
The file data may be restored to the storage pool to which it was assigned when it was backed up or it
may be restored to a pool selected by a restore or placement rule using the backed up attributes for the
file. GPFS supplies three subroutines that support backup and restore functions with external pools:
v gpfs_fgetattrs()
v gpfs_fputattrs()
v gpfs_fputattrswithpathname()
GPFS exports the extended attributes for a file, including its ACLs, using gpfs_fgetattrs(). Included in the
extended attributes is the name of the storage pool to which the file has been assigned, as well as file
attributes that are used for file placement. When the file is restored the extended attributes are restored
using either gpfs_fputattrs() or gpfs_fputattrswithpathname().
When a backup application uses gpfs_fputattrs() to restore the file, GPFS assigns the restored file to the
storage pool with the same name as when the file was backed up. Thus by default, restored files are
assigned to the same storage pool they were in when they were backed up. If that pool is not available,
GPFS tries to select a pool using the current file placement rules. If that fails, GPFS assigns the file to the
system storage pool.
Note: If a backup application uses gpfs_fputattrs() to restore a file, it will omit the RESTORE RULE.
When a backup application restores the file using gpfs_fputattrswithpathname(), GPFS is able to access
additional file attributes that may have been used by placement or migration policy rules to select the
storage pool for the file. This information includes the UID and GID for the owner, the access time for the
file, file modification time, file size, the amount of storage allocated, and the full path to the file. GPFS
uses gpfs_fputattrswithpathname() to match this information with restore policy rules you define.
In other words, the RESTORE rule looks at saved file attributes rather than the current file attributes.
The call to gpfs_fputattrswithpathname() tries to match the saved information to a RESTORE rule. If the
RESTORE rules cannot match saved attributes, GPFS tries to restore the file to the same storage pool it
was in when the file was backed up. If that pool is not available GPFS tries to select a pool by matching
placement rules. If that fails, GPFS assigns the file to the system storage pool.
Note: When a RESTORE rule is used, and restoring the file to the specified pool would exceed the
occupancy percentage defined for that pool, GPFS skips that rule and the policy engine looks for the next
rule that matches. While testing for matching rules, GPFS takes into account the specified replication
factor and the KB_ALLOCATED attribute of the file that is being restored.
External lists must be defined before they can be used. External lists are defined by:
RULE EXTERNAL LIST ’ListName’ EXEC ’InterfaceScript’ [OPTS ’OptionsString’] [ESCAPE ’SpecialCharacters’]
Where:
v ListName defines the name of the external list
v InterfaceScript defines the program to be invoked to operate on the list of files
v OptionsString is an optional string that, if provided, will be passed to the InterfaceScript
Example
In this example:
v listfiles is the name of the external list
v /var/mmfs/etc/listControl is the location of the executable script that defines the operations on the
list of files
v -verbose is an optional flag to the listControl script
The EXTERNAL LIST rule provides the binding between the lists generated with regular LIST rules and
the external program that you want to run with these lists as input. For example, this rule would
generate a list of all files that have more than 1 MB of data in an internal storage pool:
RULE ’ListLargeFiles’ LIST ’listfiles’ WHERE KB_ALLOCATED > 1024
By default, only user files are included in lists. To include directories, symbolic links, and other file
system objects, the DIRECTORIES_PLUS clause must be specified. For example, this rule would generate
a list of all objects in the file system.
RULE ’ListAllObjects’ LIST ’listfiles’ DIRECTORIES_PLUS
Similar to the files in the root file system, snapshot data can also be managed by using policy rules. Rules
can be written to migrate snapshot data among internal storage pools or generated in specific pools.
Snapshot data can be migrated by using the mmapplypolicy command with simple migration rules. For
example, to migrate data of a snapshot with the name snapname from an SSD pool to the Capacity pool,
use the following rule:
RULE ’MigToCap’ MIGRATE FROM POOL ’SSD’ TO POOL ’Capacity’
Chapter 26. Information lifecycle management for IBM Spectrum Scale 409
Then, run the mmapplypolicy command with the -S snapname parameter to complete the migration.
Snapshot data belonging to AFM and AFM DR can also be migrated. Use the following rule:
RULE ’migrate’ MIGRATE FROM POOL ’POOL1’ TO POOL ’POOL2’
In this example, data is migrated from POOL1 to POOL2. You must exclude files which are internal to AFM
while migrating snapshot data. An example of a rule to exclude such files is as under:
RULE ’migrate’ MIGRATE FROM POOL ’POOL1’ TO POOL ’POOL2’ WHERE
( NOT (PATH_NAME LIKE ’/%/.afm%’) OR (PATH_NAME LIKE ’/%/.ptrash%’)
OR (PATH_NAME LIKE ’/%/.afmtrash%’)OR (PATH_NAME LIKE ’/%/.pconflicts%’))
Note:
v The snapshot data cannot be migrated to external pools.
v The migration rules for snapshot data cannot be mixed with other rule types.
v SetXattr file function is not allowed on both the MIGRATE and SET SNAP_POOL rules for snapshot files.
A snapshot placement rule can be used to generate snapshot data in specific internal pools. For example,
to generate the snapshot data for all snapshots in the Capacity pool, use the following rule:
RULE ’SnapPlacement’ SET SNAP_POOL ’Capacity’
Snapshot data for specific snapshots can be placed in specific pools by using the following rule:
RULE ’SnapPlacement’ SET SNAP_POOL ’Capacity’ WHERE SNAP_NAME LIKE ’%daily%’
Include this rule in the set of rules installed for the file system. Placement of a snapshot file happens
when the first data block is copied to it because of the changes made to the file in the root file system.
The placement rule can be applied to snapshot data belonging to AFM and AFM DR. In the following
example, snap pool is set as POOL1, for all snapshots having psnap as a sub-pattern in the name.
RULE ’setsnappool’ SET SNAP_POOL ’POOL1’ WHERE SNAP_NAME LIKE ’%psnap%’
Deletion of files can result in these files moving to snapshot and then becoming ill-placed or ill-replicated.
In these cases, the mmrestripefile command can be used to correct the ill placement and ill replication of
snapshot files.
Filesets
In most file systems, a file hierarchy is represented as a series of directories that form a tree-like structure.
Each directory contains other directories, files, or other file-system objects such as symbolic links and
hard links. Every file system object has a name associated with it, and is represented in the namespace as
a node of the tree.
In addition, GPFS utilizes a file system object called a fileset. A fileset is a subtree of a file system
namespace that in many respects behaves like an independent file system. Filesets provide a means of
partitioning the file system to allow administrative operations at a finer granularity than the entire file
system:
GPFS supports independent and dependent filesets. An independent fileset is a fileset with its own inode
space. An inode space is a collection of inode number ranges reserved for an independent fileset. An
inode space enables more efficient per-fileset functions, such as fileset snapshots. A dependent fileset
shares the inode space of an existing, independent fileset. Files created in a dependent fileset are assigned
inodes in the same collection of inode number ranges that were reserved for the independent fileset from
which it was created.
When the file system is created, only one fileset, called the root fileset, exists. The root fileset is an
independent fileset that cannot be deleted. It contains the root directory as well as any system files such
as quota files. As new files and directories are created, they automatically become part of the parent
directory's fileset. The fileset to which a file belongs is largely transparent for ordinary file access, but the
containing fileset can be displayed along with the other attributes of each file using the mmlsattr -L
command.
The root directory of a GPFS file system is also the root of the root fileset.
Fileset namespace
A newly created fileset consists of an empty directory for the root of the fileset, and it is initially not
linked into the file system's namespace. A newly created fileset is not visible to the user until it is
attached to the namespace by issuing the mmlinkfileset command.
Filesets are attached to the namespace with a special link called a junction. A junction is a special
directory entry, much like a POSIX hard link, that connects a name in a directory of one fileset (source) to
the root directory of another fileset (target). A fileset may be the target of only one junction, so that a
fileset has a unique position in the namespace and a unique path to any of its directories. The target of
the junction is referred to as the child fileset, and a fileset can have any number of children. From the
user's viewpoint, a junction always appears as if it were a directory, but the user is not allowed to issue
the unlink or rmdir commands on a junction.
Once a fileset has been created and linked into the namespace, an administrator can unlink the fileset
from the namespace by issuing the mmunlinkfileset command. This makes all files and directories
within the fileset inaccessible. If other filesets were linked below it, the other filesets become inaccessible,
but they do remain linked and will become accessible again when the fileset is re-linked. Unlinking a
fileset, like unmounting a file system, fails if there are open files. The mmunlinkfileset command has a
force option to close the files and force the unlink. If there are open files in a fileset and the fileset is
unlinked with the force option, future references to those files will result in ESTALE errors. Once a fileset
is unlinked, it can be re-linked into the namespace at its original location or any other location (it cannot
be linked into its children since they are not part of the namespace while the parent fileset is unlinked).
The namespace inside a fileset is restricted to a single, connected subtree. In other words, a fileset has
only one root directory and no other entry points such as hard links from directories in other filesets.
Filesets are always connected at the root directory and only the junction makes this connection.
Consequently, hard links cannot cross fileset boundaries. Symbolic links, of course, can be used to
provide shortcuts to any file system object in the namespace.
The root fileset is an exception. The root fileset is attached to the local namespace using the standard
mount command. It cannot be created, linked, unlinked or deleted using the GPFS fileset commands.
Chapter 26. Information lifecycle management for IBM Spectrum Scale 411
Filesets and quotas
The GPFS quota commands support the -j option for fileset block and inode allocation.
The quota limit on blocks and inodes in a fileset are independent of the limits for specific users or groups
of users. See the following command in the IBM Spectrum Scale: Command and Programming Reference:
v mmdefedquota
v mmdefedquotaon
v mmdefedquotaoff
v mmedquota
v mmlsquota
v mmquotaoff
v mmquotaon
v mmrepquota
In addition, see the description of the --perfileset-quota parameter of the following commands:
v mmchfs
v mmcrfs
v mmlsfs
A storage pool can contain files from many filesets. However, all of the data for a particular file is wholly
contained within one storage pool.
Using file-placement policies, you can specify that all files created in a particular fileset are to be stored in
a specific storage pool. Using file-management policies, you can define how files in a specific fileset are to
be moved or deleted during the file's life cycle. See “Policy rules: Terms” on page 374.
The state of filesets in the snapshot is unaffected by changes made to filesets in the active file system,
such as unlink, link or delete. The saved file system can be accessed through the .snapshots directories
and the namespace, including all linked filesets, appears as it did when the snapshot was created.
Unlinked filesets are inaccessible in the snapshot, as they were in the active file system. However,
restoring a snapshot also restores the unlinked filesets, which can then be re-linked and accessed.
If a fileset is included in a global snapshot, it can be deleted but it is not entirely removed from the file
system. In this case, the fileset is emptied of all contents and given a status of 'deleted'. The contents of a
fileset remain available in the snapshots that include the fileset (that is, through some path containing a
.snapshots component) even after the fileset is deleted, since all the contents of the fileset are saved when
a snapshot is created. The fileset remains in the deleted state until the last snapshot containing it is
deleted, at which time the fileset is automatically deleted.
A fileset is included in a global snapshot if the snapshot is created after the fileset was created. Deleted
filesets appear in the output of the mmlsfileset and mmlsfileset --deleted commands, and the -L option
can be used to display the latest snapshot that includes a fileset.
Fileset-level snapshots
Instead of creating a global snapshot of an entire file system, a fileset snapshot can be created to preserve
the contents of a single independent fileset plus all dependent filesets that share the same inode space.
If an independent fileset has dependent filesets that share its inode space, then a snapshot of the
independent fileset will also include those dependent filesets. In other words, a fileset snapshot is a
snapshot of the whole inode space.
Each independent fileset has its own hidden .snapshots directory in the root directory of the fileset that
contains any fileset snapshots. The mmsnapdir command allows setting an option that makes global
snapshots also available through .snapshots in the root directory of all independent filesets. The
.snapshots directory in the file system root directory lists both global snapshots and fileset snapshots of
the root fileset (the root fileset is an independent fileset). This behavior can be customized with the
mmsnapdir command.
Fileset snapshot names need not be unique across different filesets, so it is valid to use the same name for
fileset snapshots of two different filesets because they will appear under .snapshots in two different
fileset root directories.
You can restore independent fileset snapshot data and attribute files with the mmrestorefs command. For
complete usage information, see the topic mmrestorefs command in the IBM Spectrum Scale: Command and
Programming Reference.
IBM Spectrum Protect has no mechanism to create or link filesets during restore. Therefore, if a file
system is migrated to IBM Spectrum Protect and then filesets are unlinked or deleted, restore or recall of
the file system does not restore the filesets.
During a full restore from backup, all fileset information is lost and all files are restored into the root
fileset. It is recommended that you save the output of the mmlsfileset command to aid in the
reconstruction of fileset names and junction locations. Saving mmlsfileset -L also allows reconstruction of
fileset comments. Both command outputs are needed to fully restore the fileset configuration.
A partial restore can also lead to confusion if filesets have been deleted, unlinked, or their junctions
moved, since the backup was made. For example, if the backed up data was in a fileset that has since
been unlinked, the restore process puts it into files and directories in the parent fileset. The unlinked
fileset cannot be re-linked into the same location until the restored data is moved out of the way.
Similarly, if the fileset was deleted, restoring its contents does not recreate the deleted fileset, but the
contents are instead restored into the parent fileset.
Since the mmbackup command operates by traversing the directory structure, it does not include the
contents of unlinked filesets, even though they are part of the file system. If it is desired to include these
filesets in the backup, they should be re-linked, perhaps into a temporary location. Conversely,
temporarily unlinking a fileset is a convenient mechanism to exclude it from a backup.
Chapter 26. Information lifecycle management for IBM Spectrum Scale 413
Note: It is recommended not to unlink filesets when doing backups. Unlinking a fileset during an
mmbackup run can cause the following:
v failure to back up changes in files that belong to an unlinked fileset
v expiration of files that were backed up in a previous mmbackup run
In summary, fileset information should be saved by periodically recording mmlsfileset output somewhere
in the file system, where it is preserved as part of the backup process. During restore, care should be
exercised when changes in the fileset structure have occurred since the backup was created.
Attention: If you are using the IBM Spectrum Protect Backup-Archive client you must use caution when
you unlink filesets that contain data backed up by IBM Spectrum Protect. IBM Spectrum Protect tracks
files by pathname and does not track filesets. As a result, when you unlink a fileset, it appears to IBM
Spectrum Protect that you deleted the contents of the fileset. Therefore, the IBM Spectrum Protect
Backup-Archive client inactivates the data on the TSM server which may result in the loss of backup data
during the expiration process.
Managing filesets
Managing your filesets includes:
v “Creating a fileset”
v “Deleting a fileset” on page 415
v “Linking a fileset” on page 415
v “Unlinking a fileset” on page 416
v “Changing fileset attributes” on page 416
v “Displaying fileset information” on page 416
Creating a fileset
Filesets are created with the mmcrfileset command.
By default, filesets are created as dependent filesets that share the inode space of the root. The
--inode-space ExistingFileset option can be used to create a dependent fileset that shares inode space with
an existing fileset. The --inode-space new option can be used to create an independent fileset with its
own dedicated inode space.
A newly created fileset consists of an empty directory for the root of the fileset and it is initially not
linked into the existing namespace. Consequently, a new fileset is not visible and files cannot be added to
it, but the fileset name is valid and the administrator can establish quotas on it or policies for it. The
administrator must link the fileset into its desired location in the file system's namespace by issuing the
mmlinkfileset command in order to make use of it.
After the fileset is linked, the administrator can change the ownership and permissions for the new root
directory of the fileset, which default to root and 0700, to allow users access to it. Files and directories
copied into or created within the directory of the fileset become part of the new fileset.
For more information, see the topics mmcrfileset command and mmlinkfileset command in the IBM Spectrum
Scale: Command and Programming Reference.
Deleting a fileset
Filesets are deleted with the mmdelfileset command.
For complete usage information, see the topics mmdelfileset command, mmlsfileset command, and
mmunlinkfileset command in the IBM Spectrum Scale: Command and Programming Reference.
Linking a fileset
After the fileset is created, a junction must be created to link it to the desired location in the file system's
namespace using the mmlinkfileset command.
The file system must be mounted in order to link a fileset. An independent fileset can be linked into only
one location anywhere in the namespace, specified by the JunctionPath parameter:
v The root directory
v Any subdirectory
v The root fileset or to any other fileset
A dependent fileset can only be linked inside its own inode space.
If JunctionPath is not specified, the junction is created in the current directory and has the same name as
the fileset being linked. After the command completes, the new junction appears as an ordinary directory,
except that the user is not allowed to unlink or delete it with the rmdir command it. The user can use the
mv command on the directory to move to a new location in the parent fileset, but the mv command is
not allowed to move the junction to a different fileset.
For complete usage information, see the topic mmlinkfileset command in the IBM Spectrum Scale: Command
and Programming Reference.
Chapter 26. Information lifecycle management for IBM Spectrum Scale 415
Unlinking a fileset
A junction to a fileset is removed with the mmunlinkfileset command, which unlinks the fileset only
from the active directory namespace. The linked or unlinked state of a fileset in a snapshot is unaffected.
The unlink fails if there are files open in the fileset, unless the -f option is specified. The root fileset
cannot be unlinked.
After issuing the mmunlinkfileset command, the fileset can be re-linked to a different parent using the
mmlinkfileset command. Until the fileset is re-linked, it is not accessible.
Note: If run against a file system that has an unlinked fileset, mmapplypolicy will not traverse the
unlinked fileset.
Attention: If you are using the IBM Spectrum Protect Backup-Archive client you must use caution when
you unlink filesets that contain data backed up by IBM Spectrum Protect. IBM Spectrum Protect tracks
files by pathname and does not track filesets. As a result, when you unlink a fileset, it appears to IBM
Spectrum Protect that you deleted the contents of the fileset. Therefore, the IBM Spectrum Protect
Backup-Archive client inactivates the data on the IBM Spectrum Protect server which may result in the
loss of backup data during the expiration process.
For complete usage information, see the topic mmunlinkfileset command in the IBM Spectrum Scale:
Command and Programming Reference.
To change the attributes of an existing fileset, including the fileset name, use the mmchfileset command.
Note: In an HSM-managed file system, moving or renaming migrated files between filesets will result in
recalling of the date from the IBM Spectrum Protect server.
For complete usage information, see the topics mmchfileset command, mmlinkfileset command, and
mmunlinkfileset command in the IBM Spectrum Scale: Command and Programming Reference.
For complete usage information, see the topic mmlsfileset command in the IBM Spectrum Scale: Command
and Programming Reference.
To display the name of the fileset that includes a given file, run the mmlsattr command and specify the
-L option. For complete usage information, see the topic mmlsattr command in the IBM Spectrum Scale:
Command and Programming Reference.
An immutable file cannot be changed or renamed. An appendOnly file allows append operations, but not
delete, modify, or rename operations.
An immutable directory cannot be deleted or renamed, and files cannot be added or deleted under such
a directory. An appendOnly directory allows new files or subdirectories to be created with 0 byte length;
all such new created files and subdirectories are marked as appendOnly automatically.
The immutable flag and the appendOnly flag can be set independently. If both immutability and
appendOnly are set on a file, immutability restrictions will be in effect.
Note: Before an immutable or appendOnly file can be deleted, you must change it to mutable or set
appendOnly to no (by using the mmchattr command).
Chapter 26. Information lifecycle management for IBM Spectrum Scale 417
The effects of file operations on immutable and appendOnly files
Once a file has been set as immutable or appendOnly, the following file operations and attributes work
differently from the way they work on regular files:
delete An immutable or appendOnly file cannot be deleted.
modify/append
An appendOnly file cannot be modified, but it can be appended. An immutable file cannot be
modified or appended.
Note: The immutable and appendOnly flag check takes effect after the file is closed; therefore,
the file can be modified if it is opened before the file is changed to immutable.
mode An immutable or appendOnly file's mode cannot be changed.
ownership, acl
These attributes cannot be changed for an immutable or appendOnly file.
extended attributes
These attributes cannot be added, deleted, or modified for an immutable or appendOnly file.
timestamp
The timestamp of an immutable or appendOnly file can be changed.
directory
If a directory is marked as immutable, no files can be created, renamed, or deleted under that
directory. However, a subdirectory under an immutable directory remains mutable unless it is
explicitly changed by mmchattr.
If a directory is marked as appendOnly, no files can be renamed or deleted under that directory.
However, 0 byte length files can be created.
The following table shows the effects of file operations on an immutable file or an appendOnly file:
Table 41. The effects of file operations on an immutable file or an appendOnly file
Operation immutable appendOnly
Add, delete, modify, or rename No No
Append No Yes
Change ownership, mode, or acl No No
Change atime, mtime, or ctime Yes Yes
Add, delete, or modify extended Disallowed by external methods such Same as for immutable.
attributes as setfattr.
You can modify the file-operation restrictions that apply to the immutable files in a fileset by setting an
integrated archive manager (IAM) mode for the fileset. The following table shows the effects of each of
the IAM modes.
Note: To set an IAM mode for a fileset, issue the mmchfileset command with the --iam-mode parameter.
For more information, see the topic mmchfileset in the IBM Spectrum Scale: Command and Programming
Reference.
Table 42. IAM modes and their effects on file operations on immutable files
Noncompliant Compliant-plus
File operation Regular mode Advisory mode mode Compliant mode mode
Modify No No No No No
Append No No No No No
Rename No No No No No
Change No No No No No
ownership, acl
Change mode No No No No No
Change atime, Yes mtime and ctime Same as advisory Same as advisory Same as advisory
mtime, ctime can be changed. mode mode mode
atime is
overloaded by
expiration time.
Expiration time
can be changed
by using the
mmchattr
--expiration-time
command
(alternatively
mmchattr -E) or
touch. You can
see the expiration
time by using stat
as atime.
Add, delete, or Not allowed for Yes Yes Yes Yes
modify extended external methods
attributes. such as setfattr.
Allowed
internally for
dmapi, directio,
and etc.
Create, rename, No No No No No
or delete under
an immutable
directory
Modify mutable Yes Yes Yes Yes Yes
files under an
immutable
directory.
Chapter 26. Information lifecycle management for IBM Spectrum Scale 419
Table 42. IAM modes and their effects on file operations on immutable files (continued)
Noncompliant Compliant-plus
File operation Regular mode Advisory mode mode Compliant mode mode
Retention rule No retention rule, No Yes Yes Yes
enforced cannot delete
immutable files
Set Yes Yes Yes No No
ExpirationTime
backwards
Delete an No Yes, always Yes, only when Yes, only when Yes, only when
immutable file expired expired expired
Set an immutable Yes No No No No
file back to
mutable
Allow hardlink No for immutable No No No No
or appendOnly
files.
Snapshots of a file system are read-only; changes can only be made to the active (that is, normal,
non-snapshot) files and directories.
The snapshot function allows a backup or mirror program to run concurrently with user updates and still
obtain a consistent copy of the file system as of the time that the snapshot was created. Snapshots also
provide an online backup capability that allows easy recovery from common problems such as accidental
deletion of a file, and comparison with older versions of a file.
Notes:
1. Because snapshots are not copies of the entire file system, they should not be used as protection
against media failures. For information about protection against media failures, see the topic
Recoverability considerations in the IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
2. Fileset snapshots provide a method to create a snapshot of an independent fileset instead of the entire
file system. For more information about fileset snapshots, see “Fileset-level snapshots” on page 413.
3. A snapshot of a file creates a new file that captures the user data and user attributes from the original.
The snapshot file is independent from the original file. For DMAPI managed file systems, the
snapshot of a file is not automatically managed by DMAPI, regardless of the state of the original file.
The DMAPI attributes from the original file are not inherited by the snapshot. For more information
about DMAPI restrictions for GPFS, see the IBM Spectrum Scale: Command and Programming Reference.
4. When snapshots are present, deleting files from the active file system does not always result in any
space actually being freed up; rather, blocks may be pushed to the previous snapshot. In this
situation, the way to free up space is to delete the oldest snapshot. Before creating new snapshots, it is
good practice to ensure that the file system is not close to being full.
5. The use of clones functionally provides writable snapshots. See Chapter 28, “Creating and managing
file clones,” on page 429.
Creating a snapshot
Use the mmcrsnapshot command to create a snapshot of an entire GPFS file system at a single point in
time. Snapshots appear in the file system tree as hidden subdirectories of the root.
Global snapshots appear in a subdirectory in the root directory of the file system, whose default name is
.snapshots. If you prefer to access snapshots from each directory rather than traversing through the root
A snapshot of the file system, Device, is identified by a SnapshotName name on the mmcrsnapshot
command. For example, given the file system fs1 to create a snapshot snap1, enter:
mmcrsnapshot fs1 snap1
Before issuing the command, the directory structure would appear similar to:
/fs1/file1
/fs1/userA/file2
/fs1/userA/file3
After the command has been issued, the directory structure would appear similar to:
/fs1/file1
/fs1/userA/file2
/fs1/userA/file3
/fs1/.snapshots/snap1/file1
/fs1/.snapshots/snap1/userA/file2
/fs1/.snapshots/snap1/userA/file3
If a second snapshot were to be created at a later time, the first snapshot would remain as is. A snapshot
can be made only of an active file system, not of an existing snapshot. The following command creates
another snapshot of the same file system:
mmcrsnapshot fs1 snap2
After the command has been issued, the directory structure would appear similar to:
/fs1/file1
/fs1/userA/file2
/fs1/userA/file3
/fs1/.snapshots/snap1/file1
/fs1/.snapshots/snap1/userA/file2
/fs1/.snapshots/snap1/userA/file3
/fs1/.snapshots/snap2/file1
/fs1/.snapshots/snap2/userA/file2
/fs1/.snapshots/snap2/userA/file3
For complete usage information, see the topic mmcrsnapshot command in the IBM Spectrum Scale: Command
and Programming Reference.
Listing snapshots
Use the mmlssnapshot command to display existing snapshots of a file system and their attributes.
For example, to display the snapshot information for the file system fs1 with additional storage
information, issue this command:
mmlssnapshot fs1 -d
For complete usage information, see the topic mmlssnapshot command in the IBM Spectrum Scale: Command
and Programming Reference.
Prior to issuing the mmrestorefs command, ensure that the file system is mounted. When restoring from
an independent fileset snapshot, ensure that the fileset is in linked state.
Existing snapshots, including the one being used in the restore, are not modified by the mmrestorefs
command. To obtain a snapshot of the restored file system, you must issue the mmcrsnapshot command
to capture it before issuing the mmrestorefs command again.
As an example, suppose that you have a directory structure similar to the following:
/fs1/file1
/fs1/userA/file2
/fs1/userA/file3
/fs1/.snapshots/snap1/file1
/fs1/.snapshots/snap1/userA/file2
/fs1/.snapshots/snap1/userA/file3
If the directory userA is then deleted, the structure becomes similar to this:
/fs1/file1
/fs1/.snapshots/snap1/file1
/fs1/.snapshots/snap1/userA/file2
/fs1/.snapshots/snap1/userA/file3
The directory userB is then created using the inode originally assigned to userA, and another snapshot is
taken:
mmcrsnapshot fs1 snap2
For complete usage information, see the topic mmrestorefs command in the IBM Spectrum Scale: Command
and Programming Reference.
Notes:
1. Snapshots are read-only. Policy rules such as MIGRATE or DELETE that make changes or delete files
cannot be used with a snapshot.
2. An instance of mmapplypolicy can only scan one snapshot. Directing it at the .snapshots directory
itself will result in a failure.
For complete usage information, see the topic mmapplypolicy command in the IBM Spectrum Scale: Command
and Programming Reference.
Linking to a snapshot
Snapshot root directories appear in a special .snapshots directory under the file system root.
If you prefer to link directly to the snapshot rather than always traverse the root directory, you can use
the mmsnapdir command with the -a option to add a .snapshots subdirectory to all directories in the file
system. These .snapshots subdirectories will contain a link into the corresponding directory for each
snapshot that includes the directory in the active file system.
Unlike .snapshots in the root directory, however, the .snapshots directories added by the -a option of the
mmsnapdir command are invisible in the sense that the ls command or readdir() function does not
return .snapshots. This is to prevent recursive file system utilities such as find or tar from entering into
the snapshot tree for each directory they process. For example, if you enter ls -a /fs1/userA, the
.snapshots directory is not listed. However, you can enter ls /fs1/userA/.snapshots or cd
/fs1/userA/.snapshots to confirm that .snapshots is present. If a user wants to make one of their snapshot
directories more visible, it is suggested to create a symbolic link to .snapshots.
The inode numbers that are used for and within these special .snapshots directories are constructed
dynamically and do not follow the standard rules. These inode numbers are visible to applications
Specifying the -r option on the mmsnapdir command reverses the effect of the -a option, and reverts to
the default behavior of a single .snapshots directory in the root directory.
The -s option allows you to change the name of the .snapshots directory. For complete usage information,
see the topic mmsnapdir command in the IBM Spectrum Scale: Command and Programming Reference.
To illustrate this point, assume that a GPFS file system called fs1, which is mounted at /fs1, has one
snapshot called snap1. The file system might appear similar to this:
/fs1/userA/file2b
/fs1/userA/file3b
/fs1/.snapshots/snap1/userA/file2b
/fs1/.snapshots/snap1/userA/file3b
To create links to the snapshots from each directory, and instead of .snapshots, use the name .links, enter:
mmsnapdir fs1 -a -s .links
After the command completes, the directory structure would appear similar to:
/fs1/userA/file2b
/fs1/userA/file3b
/fs1/userA/.links/snap1/file2b
/fs1/userA/.links/snap1/file3b
/fs1/.links/snap1/userA/file2b
/fs1/.links/snap1/userA/file3b
After the command completes, the directory structure is similar to the following:
/fs1/userA/file2b
/fs1/userA/file3b
/fs1/.links/snap1/userA/file2b
/fs1/.links/snap1/userA/file3b
For complete usage information, see the topic mmsnapdir command in the IBM Spectrum Scale: Command
and Programming Reference.
Deleting a snapshot
Use the mmdelsnapshot command to delete GPFS snapshots of a file system.
For example, to delete snap1 for the file system fs1, enter:
mmdelsnapshot fs1 snap1
For complete usage information, see the topic mmdelsnapshot command in the IBM Spectrum Scale: Command
and Programming Reference.
Snapshots can be used in environments where multiple recovery points are necessary. A snapshot can be
taken of file system or fileset data and then the data can be recovered from the snapshot if the production
data becomes unavailable.
Note:
v Snapshots are read-only; changes can be made only to the normal and active files and directories, not
to the snapshot.
v When a snapshot of an independent fileset is taken, only nested dependent filesets are included in the
snapshot.
You can either manually create the snapshots or create snapshot rules to automate the snapshot creation
and retention through the IBM Spectrum Scale GUI. These features are not available through the CLI.
To manually create a snapshot, click Create Snapshot in the Snapshots page and enter the required
details under the Manual tab of the Create Snapshot window. Click Create after entering the details.
By creating a snapshot rule, you can automate the snapshot creation and retention. That is, in a snapshot
rule you can specify a frequency in which the snapshots must be created and the number of snapshots
that must be retained for a period. The retention policy helps to avoid unwanted storage of snapshots
that result in waste of storage resources.
The following table provides an example for the values that are specified against these parameters.
Table 43. Example for retention period
Number of Keep latest snapshots for
most recent
Frequency Minute snapshots Hours Days Weeks Months
Hourly 1 2 2 6 2 3
Based on the this retention rule, the following snapshots are created and retained on March 20, 2016 at
06:10 AM:
Table 44. Example - Time stamp of snapshots that are retained based on the retention policy
Time stamp Condition based on which snapshot is retained
December 31 (Thursday, 11:01 PM) Keep latest snapshot for last 3 months
January 31 (Sunday, 11:01 PM) Keep latest snapshot for last 3 months
According to this rule, 13 snapshots are retained on March 20, 2016 at 06:10 AM.
If you do not specify a name for the snapshot, the default name is given. The default snapshot ID is
generated at the creation time by using the format "@GMT-yyyy.MM.dd-HH.mm.ss". If this option is given
and the "@GMT-date-time" format is omitted, then this snapshot will not be identifiable by Windows VSS
and the file restore is not possible by that method. Avoid white spaces, double and single quotation
marks, the parentheses (), the star *, forward slash /, and backward slash \.
Deleting snapshots
To manually delete the snapshots, right-click the snapshot from the Snapshots page and select Delete.
The snapshots that are automatically created based on the snapshot creation rule, are deleted
automatically based on the retention period specified in the rule. When the condition for deletion is met,
the GUI immediately startS to delete the snapshot candidates.
The peer and recovery point objective (RPO) snapshots are used in the AFM and AFM DR configurations
to ensure data integrity and availability. When a peer snapshot is taken, it creates a snapshot of the cache
fileset and then queues a snapshot creation at the home site. This ensures application consistency at both
cache and home sites. The RPO snapshot is a type of peer snapshot that is used in the AFM DR setup. It
is used to maintain consistency between the primary and secondary sites in an AFM DR configuration.
Use the Create Peer Snapshot option in the Files > Snapshots page to create peer snapshots. You can
view and delete these peer snapshots from the Snapshots page and also from the detailed view of the
Files > Active File Management page.
Cloning a file is similar to creating a copy of a file, but the creation process is faster and more space
efficient because no additional disk space is consumed until the clone or the original file is modified.
Multiple clones of the same file can be created with no additional space overhead. You can also create
clones of clones.
Creating a file clone from a regular file is a two-step process using the mmclone command with the snap
and copy keywords:
1. Issue the mmclone snap command to create a read-only snapshot of the file to be cloned. This
read-only snapshot becomes known as the clone parent. For example, the following command creates
a clone parent called snap1 from the original file file1:
mmclone snap file1 snap1
Alternately, if only one file is specified with the mmclone snap command, it will convert the file to a
read-only clone parent without creating a separate clone parent file. When using this method to create
a clone parent, the specified file cannot be open for writing or have hard links. For example, the
following command converts file1 into a clone parent.
mmclone snap file1
2. Issue the mmclone copy command to create a writable clone from a clone parent. For example, the
following command creates a writable file clone called file2 from the clone parent snap1:
mmclone copy snap1 file2
Creating a file clone where the source is in a snapshot only requires one step using the mmclone
command with the copy keyword. For example, the following command creates a writable file clone
called file3.clone from a file called file3 in a snapshot called snap2:
mmclone copy /fs1/.snapshots/snap2/file3 file3.clone
Note: Extended attributes of clone parents are not passed along to file clones.
Additional clones can be created from the same clone parent by issuing additional mmclone copy
commands, for example:
mmclone copy snap1 file3
File clones of clones can also be created, as shown in the following example:
mmclone snap file1 snap1
mmclone copy snap1 file2
echo hello >> file2
mmclone snap file2 snap2
mmclone copy snap2 file3
The echo command updates the last block of file clone file2. When file2 is snapped to snap2, the
mmclone snap operation is performed as described previously. When a block in file3 is read, the clone
parent inode is found first. For the case of the last block, with the hello text, the disk address will be
found in snap2. However, for other blocks, the disk address will be found in snap1.
For complete usage information, see the topic mmclone command in the IBM Spectrum Scale: Command and
Programming Reference.
The show keyword of the mmclone command provides a report to determine the current status of one or
more files. When a file is a clone, the report will show the parent inode number. When a file was cloned
from a file in a snapshot, mmclone show displays the snapshot and fileset information.
Note: There is a brief period of time, immediately following the deletion of the file clone copies, when
deletion of the parent can fail because the clone copy deletions are still running in the background.
File clones can be split from their clone parents in one of two ways:
v Using the mmclone redirect command to split the file clone from the immediate clone parent only. The
clone child remains a file clone, but the clone parent can be deleted.
v Using the mmclone split command to split the file clone from all clone parents. This converts the
former clone child to a regular file. The clone parent does not change.
For complete usage information, see the topic mmclone command in the IBM Spectrum Scale: Command and
Programming Reference.
When reading a file clone in the snapshot, the system will distinguish between the states of the clone:
When a snapshot has file clones, those file clones should be deleted or split from their clone parents prior
to deleting the snapshot. See “Deleting file clones” on page 431 and “Splitting file clones from clone
parents” on page 431 for more information. A policy file can be created to help determine if a snapshot
has file clones. See “File clones and policy files” for more information.
The following example shows a policy file that can be created for displaying clone attributes for all files:
RULE EXTERNAL LIST ’x’ EXEC ’’
RULE ’nonClone’ LIST ’x’ SHOW(’nonclone’) WHERE Clone_Parent_Inode IS NULL
RULE ’normalClone’ LIST ’x’ SHOW(
’inum ’ || varchar(Clone_Parent_Inode) ||
’ par ’ || varchar(Clone_Is_Parent) ||
’ psn ’ || varchar(Clone_Parent_Is_Snap) ||
’ dep ’ || varchar(Clone_Depth))
WHERE Clone_Parent_Inode IS NOT NULL AND Clone_Parent_Is_Snap == 0
RULE ’snapClone’ LIST ’x’ SHOW(
’inum ’ || varchar(Clone_Parent_Inode) ||
’ par ’ || varchar(Clone_Is_Parent) ||
’ psn ’ || varchar(Clone_Parent_Is_Snap) ||
’ dep ’ || varchar(Clone_Depth) ||
’ Fid ’ || varchar(Clone_Parent_Fileset_Id) ||
’ snap ’ || varchar(Clone_Parent_Snap_Id))
WHERE Clone_Parent_Inode IS NOT NULL AND Clone_Parent_Is_Snap != 0
If this policy file was called pol.file, the following command would display the clone attributes:
mmapplypolicy fs0 -P pol.file -I defer -f pol -L 0
Note: This feature is available with IBM Spectrum Scale Standard Edition or higher.
To protect a file system against disaster the following steps must be taken to ensure all data is safely
stored in a second location:
1. Record the file system configuration with the mmbackupconfig command.
2. Ensure all file data is pre-migrated (see “Pre-migrating files with external storage pools” on page 407
for more information).
3. Perform a metadata image backup with the mmimgbackup command.
The mmbackupconfig command must be run prior to running the mmimgbackup command. No changes to
file system configuration, filesets, quotas, or other settings should be done between running the
mmbackupconfig command and the mmimgbackup command. To recover from a disaster, the
mmrestoreconfig command must be run prior to running the mmimgrestore command. The file system
being restored must have the same inode size and metadata block size as the file system that was backed
up. Use the mmrestoreconfig -F QueryResultFile option to create the QueryResultFile. Use the example
of the mmcrfs command within the QueryResultFile to recreate your file system. After restoring the image
data and adjusting quota settings, the file system can be mounted read-write, and the HSM system
re-enabled to permit file data recall. Users may be permitted to access the file system, and/or the system
administrator can manually recall file data with the IBM Spectrum Protect for Space Management
command dsmrecall.
Throughout these procedures, the sample file system used is called smallfs. Where appropriate, replace
this value with your file system name.
1. Backup the cluster configuration information.
The cluster configuration must be backed up by the administrator. The minimum cluster configuration
information needed is: IP addresses, node names, roles, quorum and server roles, cluster-wide
configuration settings from mmchconfig, cluster manager node roles, remote shell configuration,
mutual ssh and rsh authentication setup, and the cluster UID. More complete configuration
information can be found in the mmsdrfs file and CCR.
2. Preserve disk configuration information.
Disk configuration must also be preserved in order to recover a file system. The basic disk
configuration information needed, for a backup intended for disaster recovery, is the number of disk
volumes that were previously available and the sizes of those volumes. In order to recover from a
complete file system loss, at least as much disk space as was previously available will be needed for
restoration. It is only feasible to restore the image of a file system onto replacement disks if the disk
volumes available are of similar enough sizes to the originals that all data can be restored to the new
disks. At a minimum, the following disk configuration information is needed:
v Disk device names
v Disk device sizes
Be sure to copy the temporary file that is created by the preceding command to a secure location so
that it can be retrieved and used during a disaster recovery.
4. Pre-migrate all newer file data into secondary storage.
File contents in a space-managed GPFS will reside in secondary storage managed by the HSM. In the
case of IBM Spectrum Protect HSM, disk and tape pools will typically hold the offline images of
migrated files. HSM can also be used to pre-migrate all newer file data into secondary storage, so that
all files will have either a migrated or pre-migrated status (XATTR) recorded, and their current
contents are copied or updated into the secondary storage. The IBM Spectrum Protect command
dsmmigrate can be used as follows:
dsmmigrate -Premigrate -Recursive /smallfs
To optionally check the status of the files that were pre-migrated with the previous command, use the
following command:
dsmls /smallfs/*
5. Create a global snapshot of the live file system, to provide a quiescent image for image backup, using
a command similar to the following:
mmcrsnapshot smallfs smallfssnap
6. Choose a staging area in which to save the GPFS metadata image files.
The image backup process stores each piece of the partial file system image backup in its own file in
the shared work directory typically used by policy runs. These files can become quite large depending
on the number of files in the file system. Also, because the file system holding this shared directory
must be accessible to every node participating in the parallel backup task, it might also be a GPFS file
system. It is imperative that the staging directory chosen be accessible to both the tsapolicy archiver
process and the IBM Spectrum Protect Backup-Archive client. This staging directory is specified with
the -g option of the mmimgbackup command.
7. Backup the file system image.
The following command will back up an image of the GPFS metadata from the file system using a
parallel policy run with the default IBM Spectrum Protect backup client to backup the file system
metadata image:
mmimgbackup smallfs -S smallfssnap -g /u/user/backup -N aixnodes
The metadata of the file system, the directories, inodes, attributes, symlinks, and so on are all
captured in parallel by using the archive module extension feature of the mmapplypolicy command.
After completing the parallel execution of the policy-driven archiving process, a collection of image
files in this format will remain. These image files are gathered by the mmimgbackup command and
archived to IBM Spectrum Protect automatically.
If you are using the -N nodes option, it is a good idea to use the same operating system when running
mmimgbackup. Also, the directory that was created with the -g GlobalWorkDirectory option to store
the image files must exist and must be accessible from all the nodes that are specified.
In order to restore a file system, the configuration data stored from a previous run of mmbackupconfig
and the image files produced from mmimgbackup must be accessible.
Throughout these procedures, the sample file system used is called smallfs. Where appropriate, replace
this value with your file system name.
1. Restore the metadata image files from mmimgbackup and the backup configuration data from
mmbackupconfig with a dsmc command similar to the following:
dsmc restore -subdir=yes /u/user/backup/8516/
2. Retrieve the base file system configuration information.
Use the mmrestoreconfig command to generate a configuration file, which contains the details of the
former file system:
mmrestoreconfig Device -i InputFile -F QueryResultFile
3. Recreate NSDs if they are missing.
Using the output file generated in the previous step as a guide, the administrator might need to
recreate NSD devices for use with the restored file system. In the output file, the NSD configuration
section contains the NSD information; for example:
######## NSD configuration ##################
## Disk descriptor format for the mmcrnsd command.
## Please edit the disk and desired name fields to match
## your current hardware settings.
##
## The user then can uncomment the descriptor lines and
## use this file as input to the -F option.
#
# %nsd:
# device=DiskName
# nsd=nsd8
# usage=dataAndMetadata
# failureGroup=-1
# pool=system
#
If changes are needed, edit the file in a text editor and follow the included instructions to use it as
input to the mmcrnsd command, then issue the following command:
mmcrnsd -F StanzaFile
4. Recreate the base file system.
The administrator must recreate the initial file system. The output query file specified in the
previous commands can be used as a guide. The following example shows the section of this file
that is needed when recreating the file system:
######### File system configuration #############
## The user can use the predefined options/option values
## when recreating the file system. The option values
## represent values from the backed up file system.
#
# mmcrfs FS_NAME NSD_DISKS -j cluster -k posix -Q yes -L 4194304 --disable-fastea
-T /fs2 -A no --inode-limit 278016#
#
GPFS provides a number of features that facilitate the implementation of highly-available GPFS
environments capable of withstanding catastrophic hardware failures. By maintaining a replica of the file
system's data at a geographically-separate location, the system sustains its processing using the secondary
replica of the file system in the event of a total failure in the primary environment.
The primary advantage of both synchronous mirroring methods is the minimization of the risk of
permanent data loss. Both methods provide two consistent, up-to-date replicas of the file system, each
available for recovery if the other one fails. However, inherent to all solutions that synchronously mirror
data over a wide area network link is the latency penalty that is induced by the replicated write I/Os.
This makes both synchronous mirroring methods prohibitively inefficient for certain types of
performance-oriented applications of where there is a longer distance between sites. The asynchronous
method effectively eliminates this penalty but in a situation where the primary site is lost, there might be
updates that have not yet been transferred to the secondary site. Asynchronous replication will still
provide a crash consistent and restartable copy of the primary data.
Different storage-level replication capabilities are available on both IBM and non-IBM storage systems.
IBM provides storage-level replication functionality on the following platforms:
v The DS8000 provides synchronous replication with Metro Mirror and asynchronous replication with
Global Mirror. Three and four site replication topologies are also possible by combining these functions.
For more information, see IBM DS8000 series V7.2 documentation at http://www-01.ibm.com/
support/knowledgecenter/HW213_7.2.0/com.ibm.storage.ssic.help.doc/f2c_ichomepage.htm.
v The Storwize family of storage systems also provides a synchronous replication capability with Metro
Mirror and has two versions of asynchronous replication called Global Mirror and Global Mirror with
Change Volumes. Point in Time copy functionality is provided by FlashCopy. For more information, see
IBM Storwize V7000 at http://www-01.ibm.com/support/knowledgecenter/ST3FR7/welcome.
v The XIV provides both synchronous and asynchronous Remote Replication and also provides point in
time copy functionality referred to as Snapshot. For more information, see IBM XIV Storage System
documentation at http://www-01.ibm.com/support/knowledgecenter/STJTAG/
com.ibm.help.xivgen3.doc/xiv_kcwelcomepage.html.
Note: In this document, synchronous replication is referred to as Metro Mirror, asynchronous replication
is referred to as Global Mirror, and point in time copy functionality is referred to as FlashCopy.
A group of volumes that share a common recovery point is commonly called a consistency group. The
storage controller ensures that after a failure, all of the volumes within a consistency group are recovered
to the same point in time.
When using a storage-based replication with IBM Spectrum Scale, it is important to ensure that all the
NSD’s in a file system are contained within the same consistency group. This way the metadata NSD’s
are always in sync with the data NSD’s after a failure.
For this reason, users are asked to zone their SAN configurations such that at most one replica of any
given GPFS disk is visible from any node. That is, the nodes in your production cluster should have
access to the disks that make up the actual file system but should not see the disks that hold the
replicated copies, whereas the backup server should see the replication targets but not the originals.
Alternatively, you can use the nsddevices user exit located in /var/mmfs/etc/ to explicitly define the
subset of the locally visible disks to be accessed during the NSD device scan on the local node.
The following procedure is used to define an nsddevices user exit file to instruct GPFS to use a specific
disk diskA1 rather than other copies of this device, which might also be available:
echo "echo diskA1 hdisk" > /var/mmfs/etc/nsddevices chmod 744 /var/mmfs/etc/nsddevices
The data and metadata replication features of GPFS are used to maintain a secondary copy of each file
system block, relying on the concept of disk failure groups to control the physical placement of the
individual copies:
1. Separate the set of available disk volumes into two failure groups. Define one failure group at each of
the active production sites.
2. Create a replicated file system. Specify a replication factor of 2 for both data and metadata.
When allocating new file system blocks, GPFS always assigns replicas of the same block to distinct failure
groups. This provides a sufficient level of redundancy allowing each site to continue operating
independently should the other site fail.
GPFS enforces a node quorum rule to prevent multiple nodes from assuming the role of the file system
manager in the event of a network partition. Thus, a majority of quorum nodes must remain active in
order for the cluster to sustain normal file system usage. Furthermore, GPFS uses a quorum replication
algorithm to maintain the content of the file system descriptor (one of the central elements of the GPFS
metadata). When formatting the file system, GPFS assigns some number of disks (usually three) as the
descriptor replica holders that are responsible for maintaining an up-to-date copy of the descriptor.
Similar to the node quorum requirement, a majority of the replica holder disks must remain available at
all times to sustain normal file system operations. This file system descriptor quorum is internally
controlled by the GPFS daemon. However, when a disk has failed due to a disaster you must manually
inform GPFS that the disk is no longer available and it should be excluded from use.
This three-site configuration is resilient to a complete failure of any single hardware site. Should all disk
volumes in one of the failure groups become unavailable, GPFS performs a transparent failover to the
remaining set of disks and continues serving the data to the surviving subset of nodes with no
administrative intervention. While nothing prevents you from placing the tiebreaker resources at one of
the active sites, to minimize the risk of double-site failures it is suggested you install the tiebreakers at a
third, geographically distinct location.
Note: If you create an ignoreAnyMount.<file_system_name> file, you cannot manually mount the file
system on the tiebreaker node.
If you do not follow these practices, an unexpected file system unmount can occur during site failures,
because of the configuration of the tiebreaker node and the unmountOnDiskFail option.
v In a stretch cluster environment, designate at least one quorum node from each site as a manager node.
During site outages, the quorum nodes can take over as manager nodes.
Note: There are no special networking requirements for this configuration. For example:
v You do not need to create different subnets.
v You do not need to have GPFS nodes in the same network across the two production sites.
v The production sites can be on different virtual LANs (VLANs).
The high-level organization of a replicated GPFS cluster for synchronous mirroring where all disks are
directly attached to all nodes in the cluster is shown in Figure 13. An alternative to this design would be
to have the data served through designated NSD servers.
With GPFS release 4.1.0, a new, more fault-tolerant configuration mechanism has been introduced as the
successor for the server-based mechanisms. The server-based configuration mechanisms consist of two
configuration servers specified as the primary and secondary cluster configuration server. The new
configuration mechanism uses all specified quorum nodes in the cluster to hold the GPFS configuration
and is called CCR (Clustered Configuration Repository). The CCR is used by default during cluster
creation unless the CCR is explicitly disabled. The mmlscluster command reports the configuration
mechanism in use in the cluster.
The following sections describe the differences regarding disaster recovery for the two configuration
mechanisms.
site C (tiebreaker)
GPFS cluster
q
P q q S q q
Note: The cluster is created with the Cluster Configuration Repository (CCR) enabled. This option
is the default on IBM Spectrum Scale v4.1 or later.
2. Issue the following command to enable the unmountOnDiskFile attribute on nodeC:
mmchconfig unmountOnDiskFail=yes -N nodeC
Enabling this attribute prevents false disk errors in the SAN configuration from being reported to the
file system manager.
Important: In a synchronous replication environment, the following rules are good practices:
v The following rules apply to nodeC, which is the only node on site C and is also a client node and
a quorum node:
– Do not designate the nodeC as a manager node.
– Do not mount the file system on nodeC.
To avoid unexpected mounts, create the following empty file on nodeC:
/var/mmfs/etc/ignoreAnyMount.<file_system_name>
For example, if the file system is fs0, create the following empty file:
/var/mmfs/etc/ignoreAnyMount.fs0
Note: If you create an ignoreAnyMount.<file_system_name> file, you cannot manually mount the
file system on nodeC.
Important: Note that the stanzas make the following failure group assignments:
v The disks at site A are assigned to failure group 1.
v The disks at site B are assigned to failure group 2.
v The disk that is local to nodeC is assigned to failure group 3.
b. Issue the following command to create the NSDs:
mmcrnsd –F clusterDisks
c. Issue the following command to verify that the network shared disks are created:
mmlsnsd -m
You can now use node class names with IBM Spectrum Scale commands ("mm" commands) to recover
sites easily after a cluster failover and failback. For example, with the following command you can
bring down all the nodes on site B with one parameter, rather than having to pass all the node names
for site B into the command:
mmshutdown -N gpfs.siteB
For information on the recovery procedure, see “Failback with temporary loss using the Clustered
Configuration Repository (CCR) configuration mechanism” on page 449.
The cluster is configured with synchronous replication to recover from a site failure.
Existing quorum designations must be relaxed in order to allow the surviving site to fulfill quorum
requirements:
1. To relax node quorum, temporarily change the designation of each of the failed quorum nodes to
non-quorum nodes. Issue the mmchnode --nonquorum command.
2. To relax file system descriptor quorum, temporarily eliminate the failed disks from the group of disks
from which the GPFS daemon uses to write the file system descriptor file to. Issue the mmfsctl
exclude command for each of the failed disks.
While the GPFS cluster is in a failover state, it is suggested that no changes to the GPFS configuration be
made. If the server-based configuration mechanism is in use, changes to your GPFS configuration require
both cluster configuration servers to be operational. If both servers are not operational, the sites would
have distinct, and possibly inconsistent, copies of the GPFS mmsdrfs configuration data file. While the
servers can be migrated to the surviving site, it is best to avoid this step if the disaster does not leave the
affected site permanently disabled.
If it becomes absolutely necessary to modify the GPFS configuration while in failover mode, for example
to relax quorum, you must ensure that all nodes at the affected site are powered down and left in a
stable inactive state. They must remain in such state until the decision is made to execute the failback
procedure. As a means of precaution, we suggest disabling the GPFS autoload option on all nodes to
prevent GPFS from bringing itself up automatically on the affected nodes should they come up
spontaneously at some point after a disaster.
Following a disaster, which failover process is implemented depends upon whether or not the tiebreaker
site is affected.
The proposed three-site configuration is resilient to a complete failure of any single hardware site. Should
all disk volumes in one of the failure groups become unavailable, GPFS performs a transparent failover to
the remaining set of disks and continues serving the data to the surviving subset of nodes with no
administrative intervention.
Failover with the loss of tiebreaker site C with Clustered Configuration Repository (CCR) in use
Make no further changes to the quorum designations at site B until the failed sites are back on line and
the following failback procedure has been completed.
Do not shut down the current set of nodes on the surviving site B and restart operations on the failed
sites A and C. This will result in a non-working cluster.
Failback procedures:
Which failback procedure you follow depends upon whether the nodes and disks at the affected site have
been repaired or replaced.
If the disks have been repaired, you must also consider the state of the data on the failed disks:
v For nodes and disks that have been repaired and you are certain the data on the failed disks has not
been changed, follow either:
– failback with temporary loss and no configuration changes
– failback with temporary loss and configuration changes
Delayed failures: In certain failure cases the loss of data may not be immediately apparent. For example,
consider this sequence of events:
1. Site B loses connectivity with sites A and C.
2. Site B then goes down due to loss of node quorum.
3. Sites A and C remain operational long enough to modify some of the data on disk but suffer a
disastrous failure shortly afterwards.
4. Node and file system descriptor quorums are overridden to enable access at site B.
Now the two replicas of the file system are inconsistent and the only way to reconcile these copies during
recovery is to:
1. Remove the damaged disks at sites A and C.
2. Either replace the disk and format a new NSD or simply reformat the existing disk if possible.
3. Add the disk back to the file system, performing a full resynchronization of the file system's data and
metadata and restore the replica balance using the mmrestripefs command.
If the outage was of a temporary nature and your configuration has not been altered, it is a simple
process to fail back to the original state.
After all affected nodes and disks have been repaired and you are certain the data on the failed disks has
not been changed:
1. Start GPFS on the repaired nodes where the file gpfs.sitesAC lists all of the nodes at sites A and C:
mmstartup -N gpfs.sitesAC
2. Restart the affected disks. If more than one disk in the file system is down, they must all be started at
the same time:
mmchdisk fs0 start -a
Failback with temporary loss and configuration changes in the server-based configuration:
If the outage was of a temporary nature and your configuration has been altered, follow this procedure to
fail back to the original state in case primary and secondary configuration servers are in use.
After all affected nodes and disks have been repaired and you are certain that the data on the failed disks
has not been changed:
1. Ensure that all nodes have the latest copy of the mmsdrfs file:
mmchcluster -p LATEST
For more information about the mmsdrfs file, see Recovery from loss of GPFS cluster configuration data file
in the IBM Spectrum Scale: Problem Determination Guide.
2. Migrate the primary cluster configuration server back to site A:
mmchcluster -p nodeA001
3. Restore node quorum designations at sites A and C:
mmchnode --quorum -N nodeA001,nodeA002,nodeA003,nodeC
4. Start GPFS on the repaired nodes where the file gpfs.sitesAC lists all of the nodes at sites A and C:
mmstartup -N gpfs.sitesAC
5. Restore the file system descriptor quorum by informing the GPFS to include the repaired disks:
mmfsctl fs0 include -d "gpfs1nsd;gpfs2nsd;gpfs5nsd"
Failback with temporary loss using the Clustered Configuration Repository (CCR) configuration mechanism:
If the outage was of a temporary nature and your configuration has been altered, follow this procedure to
failback to the original state, in case the Clustered Configuration Repository (CCR) configuration scheme
is in use.
After all affected nodes and disks have been repaired and you are certain the data on the failed disks has
not been changed, complete the following steps.
1. Shut down the GPFS daemon on the surviving nodes at site B , and on the former failed and now
recovered sites A and C , where the file gpfs.siteB lists all of the nodes at site B and the file
gpfs.siteA lists all of the nodes at site A and the tiebreaker node at site C:
mmshutdown -N gpfs.siteB
mmshutdown -N gpfs.siteA
mmshutdown -N nodeC
2. Restore original node quorum designation for the tiebreaker site C at site B and start GPFS on site C:
mmstartup -N gpfs.siteB
mmchnode --quorum -N nodeC
mmstartup -N nodeC
3. Restore original node quorum designation for site A at site B and start GPFS on site A:
mmchnode --quorum -N nodeA001,nodeA002,nodeA003
mmstartup -N gpfs.siteA
4. Restore the file system descriptor quorum by informing the GPFS to include the repaired disks:
mmumount fs0 -a;mmfsctl fs0 include -d "gpfs1nsd;gpfs2nsd;gpfs5nsd"
5. Mount the file system on all nodes at sites A and B.
Note: Do not allow the failed sites A and C to come online at the same time or when site B is
unavailable or not functional.
6. Bring the disks online and restripe the file system across all disks in the cluster to restore the initial
replication properties:
mmchdisk fs0 start -a
mmrestripefs fs0 -b
If an outage is of a permanent nature, follow steps to remove and replace the failed resources, and then
resume the operation of GPFS across the cluster.
1. Remove the failed resources from the GPFS configuration
2. Replace the failed resources, then add the new resources into the configuration
3. Resume the operation of GPFS across the entire cluster
Assume that sites A and C have had permanent losses. To remove all references of the failed nodes and
disks from the GPFS configuration and replace them:
%nsd: device=/dev/diskA2
servers=nodeA003,nodeA002
usage=dataAndMetadata
failureGroup=1
%nsd: device=/dev/diskC1
servers=nodeC
usage=descOnly
failureGroup=3mmcrnsd -F clusterDisksAC
f. Add the new NSDs to the file system specifying the -r option to rebalance the data on all disks:
mmadddisk fs0 -F clusterDisksAC -r
%nsd: device=/dev/diskA2
servers=nodeA003,nodeA002
usage=dataAndMetadata
failureGroup=1
%nsd: device=/dev/diskC1
servers=nodeC
usage=descOnly
failureGroup=3mmcrnsd -F clusterDisksAC
e. Add the new NSDs to the file system specifying the -r option to rebalance the data on all disks:
mmadddisk fs0 -F clusterDisksAC -r
Synchronous replication in the storage layer continuously updates a secondary (target) copy of a disk
volume to match changes made to a primary (source) volume. A pair of volumes are configured in a
replication relationship, during which all write operations performed on the source are synchronously
mirrored to the target device.
The synchronous replication protocol guarantees that the secondary copy is constantly up-to-date by
ensuring that the primary copy is written only if the primary storage subsystem received an
acknowledgment that the secondary copy has been written. The paired volumes typically reside on two
distinct and geographically separated storage systems communicating over a SAN or LAN link.
Once the operation of the original primary volume has been restored, a failback is executed to
resynchronize the content of the two volumes. The original source volume is switched to the target mode,
after which all modified data tracks (those recorded in the modification bitmap) are copied from the
original target disk. The volume pair can then be suspended again and a similar process performed to
reverse the volumes' roles, thus bringing the pair into its initial state.
The high-level organization of an active/active GPFS cluster using hardware replication is illustrated in
Figure 14 on page 452.A single GPFS cluster is created over three sites. The data is mirrored between two
active sites with a cluster configuration server residing at each site and a tiebreaker quorum node
installed at the third location. The presence of an optional tiebreaker node allows the surviving site to
The GPFS configuration resides either on the two configuration server (primary and secondary), when the
cluster has been created with the Clustered Configuration Repository (CCR) disable option (mmcrcluster),
or on each quorum node, when the cluster has Clustered Configuration Repository (CCR) enabled, or on
the primary/secondary, when the Clustered Configuration Repository (CCR) is disabled.
tiebreaker
P q q S q q
GPFS cluster
shared shared
NSD access NSD access
Replication
Figure 14. A synchronous active-active replication-based mirrored GPFS configuration with a tiebreaker site
To establish an active-active GPFS cluster using hardware replication with a tiebreaker site as shown in
Figure 14, consider the configuration:
Site A (production site)
Consists of:
v Nodes – nodeA001, nodeA002, nodeA003, nodeA004
v Storage subsystems – A
v Disk volumes – diskA on storage system A
diskA is SAN-attached and accessible from sites A and B
Site B (recovery site)
Consists of:
v Nodes – nodeB001, nodeB002, nodeB003, nodeB004
Failover to the recovery site and subsequent failback for an active/active configuration:
For an active/active storage replication based cluster, complete these steps to restore access to the file
system through site B after site A has experienced a disastrous failure.
For an active-active PPRC-based cluster, follow these steps to restore access to the file system through site
B after site A has experienced a disastrous failure:
1. Stop the GPFS daemon on the surviving nodes as site B where the file gpfs.siteB lists all of the nodes
at site B:
mmshutdown -N gpfs.siteB
2. Perform the appropriate commands to make the secondary replication devices available and change
their status from being secondary devices to suspended primary devices.
3. If site C, the tiebreaker, failed along with site A, existing node quorum designations must be relaxed
in order to allow the surviving site to fulfill quorum requirements. To relax node quorum, temporarily
change the designation of each of the failed quorum nodes to nonquorum nodes using the –- force
option:
mmchnode --nonquorum -N nodeA001,nodeA002,nodeA003,nodeC --force
4. Ensure that the source volumes are not accessible to the recovery site:
v Disconnect the cable.
v Define the nsddevices user exit file to exclude the source volumes.
5. Restart the GPFS daemon on all surviving nodes:
mmstartup -N gpfs.siteB
Note:
v Make no further changes to the quorum designations at site B until the failed sites are back on line and
the following failback procedure has been completed.
v Do not shut down the current set of nodes on the surviving site B and restart operations on the failed
sites A and C. This will result in a non-working cluster.
Failback procedure
After the operation of site A has been restored, the failback procedure is completed to restore the access
to the file system from that location. The following procedure is the same for both configuration schemes
(server-based and Clustered Configuration Repository (CCR)). The failback operation is a two-step
process:
454 IBM Spectrum Scale 5.0.2: Administration Guide
1. For each of the paired volumes, resynchronize the pairs in the reserve direction with the recovery
LUN diskB acting as the sources for the production LUN diskA. An incremental resynchronization is
performed, which identifies the mismatching disk tracks, whose content is then copied from the
recovery LUN to the production LUN. Once the data has been copied and the replication is running
in the reverse direction this configuration can be maintained until a time is chosen to switch back to
site A.
2. Shut GPFS down at site B and reverse the disk roles (the original primary disk becomes the primary
again), bringing the replication pair to its initial state.
a. Stop the GPFS daemon on all nodes.
b. Perform the appropriate actions to switch the replication direction so that diskA is now the source
and diskB is the target.
c. If during failover you migrated the primary cluster configuration server to a node in site B:
1) Migrate the primary cluster configuration server back to site A:
mmchcluster -p nodeA001
2) Restore the initial quorum assignments:
mmchnode --quorum -N nodeA001,nodeA002,nodeA003,nodeC
3) Ensure that all nodes have the latest copy of the mmsdrfs file:
mmchcluster -p LATEST
d. Ensure the source volumes are accessible to the recovery site:
v Reconnect the cable
v Edit the nsddevices user exit file to include the source volumes
e. Start the GPFS daemon on all nodes:
mmstartup -a
f. Mount the file system on all the nodes at sites A and B.
A GPFS file system is defined over a set of disk volumes located at the production site and these disks
are mirrored using storage replication to a secondary set of volumes located at the recovery site. During
normal operation, only the nodes in the production GPFS cluster mount and access the GPFS file system
at any given time, which is the primary difference between a configuration of this type and the
active-active model.
In the event of a catastrophe in the production cluster, the storage replication target devices are made
available to be used by the nodes in the recovery site.
The secondary replica is then mounted on nodes in the recovery cluster as a regular GPFS file system,
thus allowing the processing of data to resume at the recovery site. At a latter point, after restoring the
physical operation of the production site, we execute the failback procedure to resynchronize the content
of the replicated volume pairs between the two clusters and re-enable access to the file system in the
production environment.
The high-level organization of synchronous active-passive storage replication based GPFS cluster is
shown in Figure 15 on page 456.
IP network
P S q q P S q q
shared shared
NSD access NSD access
Storage Replication
bl1adv227
S - secondary cluster configuration server
q - quorum node
Figure 15. A synchronous active-passive storage replication-based GPFS configuration without a tiebreaker site
To establish an active-passive storage replication GPFS cluster as shown in Figure 15, consider the
configuration:
Production site
Consists of:
v Nodes – nodeP001, nodeP002, nodeP003, nodeP004, nodeP005
v Storage subsystems – Storage System P
v LUN IDs and disk volume names – lunP1 (hdisk11), lunP2 (hdisk12), lunP3 (hdisk13), lunP4
(hdisk14)
Recovery site
Consists of:
v Nodes – nodeR001, nodeR002, nodeR003, nodeR004, nodeR005
v Storage subsystems – Storage System R
v LUN ids and disk volume names – lunR1 (hdisk11), lunR2 (hdisk12), lunR3 (hdisk13), lunR4
(hdisk14)
All disks are SAN-attached and directly accessible from all local nodes.
1. Establish synchronous PPRC volume pairs using the copy entire volume option:
lunP1-lunR1 (source-target)
lunP2-lunR2 (source-target)
lunP3-lunR3 (source-target)
lunP4-lunR4 (source-target)
Failover to the recovery site and subsequent failback for an active-passive configuration:
For an active-passive storage replication based cluster, complete these steps to fail over production to the
recovery site.
Note: Make no further changes to the quorum designations at site B until the failed sites are back on line
and the following failback procedure has been completed. Do not shut down the current set of nodes on
the surviving site B and restart operations on the failed sites A and C. This will result in a non-working
cluster.
Failback procedure
After the physical operation of the production site has been restored, complete the failback procedure to
transfer the file system activity back to the production GPFS cluster. The following procedure is the same
for both configuration schemes (server-based and Clustered Configuration Repository (CCR)). The
failback operation is a two-step process:
1. For each of the paired volumes, resynchronize the pairs in the reserve direction with the recovery
LUN lunRx acting as the sources for the production LUN lunPx. An incremental resynchronization
will be performed which identifies the mismatching disk tracks, whose content is then copied from
the recovery LUN to the production LUN. Once the data has been copied and the replication is
running in the reverse direction this configuration can be maintained until a time is chosen to switch
back to site P.
2. If the state of the system configuration has changed, update the GPFS configuration data in the
production cluster to propagate the changes made while in failover mode. From a node at the
recovery site, issue:
mmfsctl all syncFSconfig –n gpfs.sitePnodes
3. Stop GPFS on all nodes in the recovery cluster and reverse the disk roles so the original primary disks
become the primaries again:
a. From a node in the recovery cluster, stop the GPFS daemon on all nodes in the recovery cluster:
Several uses of the FlashCopy replica after its initial creation can be considered. For example, if your
primary operating environment suffers a permanent loss or a corruption of data, you may choose to flash
the target disks back onto the originals to quickly restore access to a copy of the file system as seen at the
time of the previous snapshot. Before restoring the file system from a FlashCopy, please make sure to
suspend the activity of the GPFS client processes and unmount the file system on all GPFS nodes.
FlashCopies also can be used to create a copy of data for disaster recovery testing and in this case are
often taken from the secondary devices of a replication pair.
When a FlashCopy disk is first created, the subsystem establishes a control bitmap that is subsequently
used to track the changes between the source and the target disks. When processing read I/O requests
sent to the target disk, this bitmap is consulted to determine whether the request can be satisfied using
the target's copy of the requested block. If the track containing the requested data has not yet been
copied, the source disk is instead accessed and its copy of the data is used to satisfy the request.
To prevent the appearance of out-of-order updates, it is important to consider data consistency when
using FlashCopy. When taking the FlashCopy image all disk volumes that make up the file system must
be copied so that they reflect the same logical point in time. Two methods may be used to provide for
data consistency in the FlashCopy image of your GPFS file system. Both techniques guarantee the
consistency of the FlashCopy image by the means of temporary suspension of I/O, but either can be seen
as the preferred method depending on your specific requirements and the nature of your GPFS client
application.
FlashCopy provides for the availability of the file system's on-disk content in another GPFS cluster. But in
order to make the file system known and accessible, you must issue the mmfsctl syncFSConfig
command to:
v Import the state of the file system's configuration from the primary location.
v Propagate all relevant changes to the configuration in the primary cluster to its peer to prevent the
risks of discrepancy between the peer's mmsdrfs file and the content of the file system descriptor
found in the snapshot.
It is suggested you generate a new FlashCopy replica immediately after every administrative change to
the state of the file system. This eliminates the risk of a discrepancy between the GPFS configuration
data contained in the mmsdrfs file and the on-disk content of the replica.
The use of FlashCopy consistency groups provides for the proper ordering of updates, but this method
does not by itself suffice to guarantee the atomicity of updates as seen from the point of view of the user
application. If the application process is actively writing data to GPFS, the on-disk content of the file
system may, at any point in time, contain some number of incomplete data record updates and possibly
The FlashCopy consistency group mechanism is used to freeze the source disk volumes at the logical
instant at which their logical image appears on the target disk volumes. The appropriate storage system
documentation should be consulted to determine how to invoke the Point in Time Copy with the
consistency group option:
Run the establish FlashCopy pair task with the freeze FlashCopy consistency group option. Create the
volume pairs:
lunS1 – lunT1 (source-target)
lunS2 – lunT2 (source-target)
Note: This feature is available with IBM Spectrum Scale Standard Edition or higher.
The participating nodes are designated as Cluster NFS (CNFS) member nodes and the entire setup is
frequently referred to as CNFS or a CNFS cluster.
In this solution, all CNFS nodes export the same file systems to the NFS clients. When one of the CNFS
nodes fails, the NFS serving load moves from the failing node to another node in the CNFS cluster.
Failover is done using recovery groups to help choose the preferred node for takeover. For the NFS client
node to experience a seamless failover, hard mounts must be used. The use of soft mounts will likely
result in stale NFS file handle conditions when a server experiences a problem, even though CNFS
failover will still be done.
Currently, CNFS is supported only in the Linux environment. For an up-to-date list of supported
operating systems, specific distributions, and other dependencies, refer to the IBM Spectrum Scale FAQ in
IBM Knowledge Center (www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html).
NFS monitoring
Every node in the CNFS cluster runs a separate GPFS utility that monitors GPFS, NFS, and networking
components on the node. Upon failure detection and based on your configuration, the monitoring utility
might invoke a failover.
While an NFS server is in a grace period, the NFS monitor sets the server's NFS state to "Degraded".
NFS failover
As part of GPFS recovery, the CNFS cluster failover mechanism is invoked. It transfers the NFS serving
load that was served by the failing node to another node in the CNFS cluster. Failover is done using
recovery groups to help choose the preferred node for takeover.
The failover mechanism is based on IP address failover. The CNFS IP address is moved from the failing
node to a healthy node in the CNFS cluster. In addition, it guarantees NFS lock (NLM) recovery.
Failover processing may involve rebooting of the problem node. To minimize the effects of the reboot, it
is recommended that the CNFS nodes be dedicated to that purpose and are not used to run other critical
processes. CNFS node rebooting should not be disabled or the failover reliability will be severely
impacted.
The GPFS cluster can be defined over an IPv4 or IPv6 network. The IP addresses specified for CNFS can
also be IPv4 or IPv6. The GPFS cluster and CNFS are not required to be on the same version of IP, but
IPv6 must be enabled on GPFS to support IPv6 on CNFS.
CNFS setup
You can set up a clustered NFS environment within a GPFS cluster.
where:
ip_address_list
Is a comma-separated list of host names or IP addresses to be used for GPFS cluster NFS serving.
node
Identifies a GPFS node to be added to the CNFS cluster.
For more information, see the topic mmchnode command in the IBM Spectrum Scale: Command and
Programming Reference.
5. Use the mmchconfig command to configure the optional CNFS parameters.
cnfsMountdPort=mountd_port
Specifies the port number to be used for the rpc.mountd daemon.
For CNFS to work correctly with the automounter (AMD), the rpc.mountd daemon on the
different nodes must be bound to the same port.
cnfsNFSDprocs=nfsd_procs
Specifies the number of nfsd kernel threads. The default is 32.
cnfsVersions=nfs_versions
Specifies a comma-separated list of protocol versions that CNFS should start and monitor. The
default is 3,4. If you are not using NFS v3 and NFS v4, specify this parameter with the
appropriate values for your configuration.
CNFS administration
There are some common CNFS administration tasks in this topic along with a sample configuration.
Note: This operation affects only the high-availability aspects of the CNFS functionality. Normal NFS
exporting of the data from the node is not affected. All currently defined CNFS IP addresses remain
unchanged. There will be no automatic failover from or to this node in case of a failure. If failover is
desired, GPFS should be shut down on the affected node prior to issuing the mmchnode command.
Note: If the GPFS daemon is running on a node on which CNFS is being re-enabled, the node will try to
activate its CNFS IP address. If the IP address was currently on some other CNFS-enabled node, that
activation would include a takeover.
Note: This operation affects only the high-availability aspects of the CNFS functionality. Normal NFS
exporting of the data from the node is not affected. All currently defined CNFS IP addresses remain
unchanged. There will be no automatic failover from or to this node in case of a failure. If failover is
desired, GPFS should be shut down on the affected node prior to issuing the mmchnode command.
Note: This feature is available with IBM Spectrum Scale Standard Edition or higher.
CES is an alternate approach to using a clustered Network File System (CNFS) to export GPFS file
systems. For more information about CES and protocol configuration, see Chapter 2, “Configuring the
CES and protocol configuration,” on page 25.
CES features
To successfully use Cluster Export Services (CES), you must consider function prerequisites, setup and
configuration, failover/failback policies, and other management and administration requirements.
The CES shared root (cesSharedRoot) directory is needed for storing CES shared configuration data,
protocol recovery, and some other protocol-specific purposes. It is part of the Cluster Export
Configuration and is shared between the protocols. Every CES node requires access to the path that is
configured as shared root.
To update the CES shared root directory, you must shut down the cluster, set the CES shared root
directory, and start the cluster again:
mmshutdown -a
mmchconfig cesSharedRoot=shared_root_path
mmstartup -a
The recommendation for CES shared root directory is a dedicated file system. It can also reside in an
existing GPFS file system. In any case, the CES shared root directory must be on GPFS and must be
available when it is configured through the mmchconfig command.
To enable protocol nodes, the CES shared root directory must be defined. To enable protocol nodes, use
the following command:
mmchnode --ces-enable -N Node1[,Node2...]
Preparing to perform service actions on the CES shared root directory file system
The CES shared root directory file system must be kept available for protocols operation to function. If a
service action is to be performed on the CES shared root directory file system, perform the steps that
follow.
Commands such as mmshutdown, mmstartup and mmmount, can be passed in the cesnodes node class
parameter to ensure operation on all protocol nodes.
The following steps are used to perform service actions on the CES shared root file system:
Note: Only protocol nodes need to be shut down for service of the CES shared root directory file
system. However, other nodes may need to unmount the file system, depending on what service is
being performed.
Protocol nodes are now ready for service actions to be performed on the CES shared root directory or the
nodes themselves. To recover from a service action:
1. Start up GPFS on all protocol nodes:
mmstartup -N cesnodes
2. Make sure that the CES shared root directory file system is mounted on all protocol nodes:
mmmount cesSharedRoot -N cesnodes
3. Verify that all protocol services have been started:
mmces service list -a
To add CES IP addresses to the address pool, use the mmces command:
mmces address add --ces-ip Address[,Address...]
By default, addresses are distributed among the CES nodes, but a new address can be assigned to a
particular node:
mmces address add --ces-ip Address[,Address...] --ces-node Node
After a CES IP address is added to the address pool, you can manually move the address to a particular
node:
mmces address move --ces-ip Address[,Address...] --ces-node Node
Removing an address while there are clients connected causes the clients to lose those connections. Any
reconnection to the removed IP results in a failure. If DNS is used to map a name entry to one or more IP
addresses, update the DNS to ensure that a client is not presented an address that was already removed
from the pool. This process might also include invalidation of any DNS caches.
The CES addresses that are assigned to the CES nodes are implemented as IP aliases. Each network
adapter that hosts CES addresses must already be configured (with different non-CES IPs) in
/etc/sysconfig. CES uses the netmask to figure out which interfaces to use. For example, if eth1 is
10.1.1.1 and eth2 is 9.1.1.1, then the CES IP 10.1.1.100 maps to eth1 and the CES IP 9.1.1.100 maps to
eth2.
To use an alias address for CES, you need to provide a static IP address that is not already defined as an
alias in the /etc/sysconfig/network-scripts directory.
Before you enable the node as a CES node, configure the network adapters for each subnet that are
represented in the CES address pool:
1. Define a static IP address for the device:
/etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
BOOTPROTO=none
IPADDR=10.1.1.10
NETMASK=255.255.255.0
ONBOOT=yes
GATEWAY=10.1.1.1
TYPE=Ethernet
2. Ensure that there are no aliases that are defined in the network-scripts directory for this interface:
# ls -l /etc/sysconfig/network-scripts/ifcfg-eth1:*
ls: /etc/sysconfig/network-scripts/ifcfg-eth1:*: No such file or directory
After the node is enabled as a CES node, no further action is required. CES addresses are added as
aliases to the already configured adapters.
Note: If you have multiple CES networks, even IP address distribution in each network for every
node might not be considered. The overall number of IP addresses on each node or CES group
takes precedence.
Specify mmces address move to manually move IP addresses from one node to another node.
balanced-load
Distributes the addresses to approach an optimized load distribution. The load (network and
CPU) on all the nodes are monitored. Addresses are moved based on given policies for optimized
load throughout the cluster.
node-affinity
Attempts to keep an address on the node to which the user manually assigned it. If the mmces
address add command is used with the --ces-node option, the address is marked as being
associated with that node. Similarly, if an address is moved with the mmces address move
command, the address is marked as being associated with the destination node. Any automatic
movement, such as reassigning a down node's addresses, does not change this association.
Addresses that are enabled with no node specification do not have a node association.
Addresses that are associated with a node but assigned to a different node are moved back to the
associated node if possible.
Automatic address distribution is performed in the background in a way as to not disrupt the protocol
servers more than necessary. If you want immediate redistribution of the addresses, use the mmces
command to force an immediate rebalance:
mmces address move --rebalance
In order to prevent an interruption in service, IP addresses that have attributes assigned to them (for
example: object_database_node or object_singleton_node) are not rebalanced.
You can further control the assignment of CES addresses by placing nodes or addresses in CES groups.
For more information, see the topic “Configuring CES protocol service IP addresses” on page 26.
Command examples:
mmces service enable [NFS | OBJ | SMB]
mmces service disable [NFS | OBJ | SMB]
When a protocol is disabled, the protocol is stopped on all CES nodes and all protocol-specific
configuration data is removed.
For example:
mmces node suspend [-N Node[,Node...]
After a node is resumed, monitoring on the node is started and the node is eligible for address
assignments.
NFS monitoring
The NFS servers are monitored to check for proper functions. If a problem is found, the CES addresses of
the node are reassigned, and the node state is set to failed. When the problem is corrected, the node
resumes normal operation.
Configuration options for the NFS service can be set with the mmnfs config command.
You can use the mmnfs config command to set and list default settings for NFS such as the port number
for the NFS service, the default access mode for exported file systems, the log level, and enable or disable
status for delegations. For a list of configurable attributes, see the topic mmnfs command in the IBM
Spectrum Scale: Command and Programming Reference.
Some of the attributes such as the protocol can be overruled for a given export on a per-client base. For
example, the default settings might have NFS protocols 3 and 4 enabled, but the export for a client might
restrict it to NFS version 4 only.
Exports can be added, removed, or changed with the mmnfs export command. Authentication must be set
up before you define an export.
Exports can be declared for any directory in the GPFS file system, including a fileset junction. At the time
where exports are declared, these folders must exist physically in GPFS. Only folders in the GPFS file
system can be exported. No folders that are located only locally on a server node can be exported
because they cannot be used in a failover situation.
Export-add and export-remove operations can be applied at run time of the NFS service. The
export-change operation does require a restart of the NFS service on all server nodes that is followed by a
60-second grace period to allow connected clients to reclaim their locks and to avoid concurrent lock
requests from new clients.
NFS failover
When a CES node leaves the cluster, the CES addresses assigned to that node are redistributed among the
remaining nodes. Remote clients that access the GPFS file system might see a pause in service while the
internal state information is passed to the new servers.
NFS clients
When you work with NFS clients, consider the following points:
v If you mount the same NFS export on one client from two different IBM Spectrum Scale NFS protocol
nodes, data corruption might occur.
v The NFS protocol version that is used as the default on a client operating system might differ from
what you expect. If you are using a client that mounts NFSv3 by default, and you want to mount
NFSv4, then you must explicitly specify NFSv4 in the mount command. For more information, see the
mount command for your client operating system.
v To prevent NFS clients from encountering data integrity issues during failover, ensure that NFS clients
are mounted with the option -o hard.
v A client must mount an NFS export by using the IP address of the GPFS system. If a host name is
used, ensure that the name is unique and remains unique.
If a DNS Round Robin (RR) entry name is used to mount an NFSv3 export, data unavailability might
occur, due to unreleased locks. The lock manager on the GPFS file system is not clustered-system-aware.
If you want to put highly available NFS services on top of the GPFS file system, you have the choice
between clustered NFS (Chapter 31, “Implementing a clustered NFS environment on Linux,” on page 463)
and Cluster Export Services (Chapter 32, “Implementing Cluster Export Services,” on page 467).
To help you choose one of these NFS offerings, consider the following points:
Multiprotocol support
If you plan to use other protocols (such as SMB or Object) in addition to NFS, CES must be
chosen. While CNFS provides support only for NFS, the CES infrastructure adds support also for
SMB and Object. With CES, you can start with NFS and add (or remove) other protocols at any
time.
Command support
While CNFS provides native GPFS command support for creation and management of the CNFS
cluster, it lacks commands to manage the NFS service and NFS exports. The CES infrastructure
introduces native GPFS commands to manage the CES cluster. Furthermore, there are also
commands to manage the supported protocol services and the NFS exports. For example, with
Note: CES provides a different interface to obtain performance metrics for NFS. CNFS uses the
existing interfaces to obtain NFS metrics from the kernel (such as nfsstat or the /proc interface).
The CES framework provides the mmperfmon query command for Ganesha-based NFS statistics.
For more information, see the topic mmperfmon command in the IBM Spectrum Scale: Command and
Programming Reference.
SELinux support
CES, including the CES framework as well as SMB and CES NFS, does not support SELinux in
enforcing mode.
Migration of CNFS to CES
For information about migrating existing CNFS environments to CES, see “Migration of CNFS
clusters to CES clusters” on page 477.
Note: Some of the features described below require a higher version than 4.1.1.
SMB clients can connect to any of the protocol nodes and get access to the shares defined. A clustered
registry makes sure that all nodes see the same configuration data. Therefore, clients can connect to any
Cluster Export Services (CES) node and see the same data. Moreover, the state of opened files (share
modes, open modes, access masks, locks, and so on) is also shared among the CES nodes so that data
integrity is maintained. On failures, clients can reconnect to another protocol node and IP addresses are
transferred to another protocol node
The supported protocol levels are SMB2 and the base functions of SMB3 (dialect negotiation, secure
negotiation, encryption of data on the wire).
With the mmsmb command, IBM Spectrum Scale provides a comprehensive entry point to manage all
SMB-related configuration tasks like creating, changing, and deleting SMB shares.
The monitoring framework detects issue with the SMB services and triggers failover in case of an
unrecoverable error.
Integrated installation
The SMB services are installed by the integrated installer together with the CES framework and the other
protocols NFS and Object.
The SMB services can be configured to authenticate against the authentication services Microsoft Active
Directory and LDAP. Mapping Microsoft security identifiers (SIDs) to the POSIX user and group IDs on
the file server can either be done automatically by using the so-called autorid mechanism or external
mapping services like RFC 2307 or Microsoft Services for Unix. If none of the offered authentication and
mapping schemes matches the environmental requirements, a user-defined configuration can be
established.
The Pike release of OpenStack is used for Swift, Keystone, and their dependent packages.
Object monitoring
The object servers are monitored to ensure that they function properly. If a problem is found, the CES
addresses of the node are reassigned, and the node state is set to failed. When the problem is corrected,
the node resumes normal operation.
The Object service configuration is controlled by the respective Swift and Keystone configuration files.
The master versions of these files are stored in the CCR repository, and copies exist in the /etc/swift and
/etc/keystone directories on each protocol node. The files that are stored in those directories should not
be directly modified since they are overwritten by the files that are stored in the CCR. To change the
Swift or Keystone configuration, use the mmobj config change command to modify the master copy of
configuration files stored in CCR. The monitoring framework is notified of the change and propagates the
file to the local file system of the CES nodes. For information about the values that can be changed and
their associated function, refer to the administration guides for Swift and Keystone.
A base fileset must be specified when the Object service is configured. An existing fileset can be used or a
new fileset can be created. All filesets are created in the GPFS file system that is specified during
installation. This fileset is automatically created in the GPFS file system that is specified during
installation. Evaluate the data that is expected to be stored by the Object service to determine the
required number of inodes that are needed. This expected number of inodes is specified during
installation, but can be updated later by using standard GPFS file system and fileset management
commands.
Object failover
When a CES node leaves the cluster, the CES addresses that are assigned to that node are redistributed
among the remaining nodes. Remote clients that access the Object service might see active connections
drop or a pause in service while the while the CES addresses are moved to the new servers. Clients with
active connections to the CES addresses that are migrated might have their connections unexpectedly
drop. Clients are expected to retry their requests when this happens.
Certain Object-related services can be migrated when a node is taken offline. If the node was hosting the
backend database for Keystone or certain Swift services that are designated as singletons (such as the
auditor), those services are started on the active node that received the associated CES addresses of the
failed node. Normal operation of the Object service resumes after the CES addresses are reassigned and
necessary services automatically restarted.
Object clients
The Object service is based on Swift and Keystone, and externalizes their associated interfaces. Clients
should follow the associated specifications for those interfaces. Clients must be able to handle dropped
connections or delays during CES node failover. In such situations, clients should retry the request or
allow more time for the request to complete.
To connect to an Object service, clients should use a load balancer or DNS service to distribute requests
among the pool of CES IP addresses. Clients in a production environment should not use hard-coded
CES addresses to connect to Object services. For example, the authentication URL should refer to a DNS
host name or a load balancer front end name such as http://protocols.gpfs.net:35357/v3 rather than a
CES address.
Object storage consumes fileset inodes when the unified file and object access layout is used. One inode
is used for each file or object, and one inode is used for each directory in the object path.
In the traditional object layout, objects are placed in the following directory path:
/ibm/gpfs/objfs/o/z1device111/objects/11247/73a/afbeca778982b05b9dddf4fed88f773a/
1461036399.66296.data
Similarly, account and container databases are placed in the following directory paths:
/ibm/gpfs/objfs/ac/z1device62/accounts/13700/f60/d61003e46b4945e0bbbfcee341d30f60/
d61003e46b4945e0bbbfcee341d30f60.db
/ibm/gpfs/objfs/ac/z1device23/containers/3386/0a9/34ea8d244872a1105b7df2a2e6ede0a9/
34ea8d244872a1105b7df2a2e6ede0a9.db
Starting at the bottom of the object path and working upward, each new object that is created requires a
new hash directory and a new object file, thereby consuming two inodes. Similarly, for account and
container data, each new account and container require a new hash directory and a db file. Also, a
db.pending and a lock file is required to serialize access. Therefore, four inodes are consumed for each
account and each container at the hash directory level.
If the parent directories do not already exist, they are created, thereby consuming additional inodes. The
hash suffix directory is three hexadecimal characters, so there can be a maximum of 0xFFF or 4096 suffix
directories per partition. The total number of partitions is specified during initial configuration. For IBM
Spectrum Scale, 16384 partitions are allocated to objects and the same number is allocated to accounts
and containers.
For each object partition directory, the hashes.pkl file is created to track the contents of the partition
subdirectories. Also, there is a .lock file that is created for each partition directory to serialize updates to
hashes.pkl. This is a total of three inodes required for each object partition.
There are 128 virtual devices allocated to object data during initial configuration, and the same number is
allocated to account and container data. For each virtual device a tmp directory is created to store objects
during upload. In the async_pending directory, container update requests that time out are stored until
they are processed asynchronously by the object updater service.
The total number of inodes used for object storage in the traditional object layout can be estimated as
follows:
total required inodes = account & container inodes + object inodes
As per this information, there are four inodes per account hash directory and four inodes per container
hash directory. In the worst case, there would be one suffix directory, one partition directory, and one
virtual device directory for each account and container. Therefore, the maximum inodes for accounts and
containers can be estimated as:
account and container inodes = (7 * maximum number of accounts) + (7 * maximum number of containers)
In a typical object store there are more objects than containers, and more containers than accounts.
Therefore, while estimating the required inodes, we estimate the number of inodes required for accounts
and containers to be seven times the maximum number of containers. The maximum required inodes can
be calculated as shown below:
max required inodes = (inodes for objects and hash directory) + (inodes required for hash directories) +
(inodes required for partition directories and partition metadata) +
(inodes required for virtual devices) + (inodes required for containers)
max required inodes = (2 x maximum number of objects) + (4096 inodes per partition * 16384 partitions) +
(16384 partitions * 3) + (128 inodes) + (7 * maximum number of containers)
Note: This applies to a case when all objects as well as account and container data are in the same fileset.
While using multiple storage policy filesets or a different fileset for account and container data, the
calculations must be adjusted.
Cluster Export Services (CES) protocol nodes have the following dependencies and restrictions:
v CES nodes cannot coexist with CNFS clusters.
v The concepts of failover in CES node groups and CNFS failover groups are slightly different. While
CNFS allows to failover not just within a group but also within ranges, CES does not. Make sure that
your failover concepts are handled correctly by CES.
v CES nodes use SMB, NFS, and Openstack SWIFT Object services.
v File system ACL permissions need to be in NFSv4 format.
v File system ACL semantics need to be set to NFSv4 format: nfs4 ACL semantics in effect.
v CES SMB (Samba) services expects NFSv4 ACL formats.
v Existing CNFS exports definitions are not compatible with CES NFS. It is best to script and automate
the creation of the equivalent exports by using the mmnfs export add command to reduce the amount
you need to change in the future.
v CES nodes need authentication that is configured.
v There is a maximum of 16 protocol nodes in a CES cluster if the SMB protocol is also enabled.
v There is a maximum of 32 protocol nodes in a CES cluster if only NFS is enabled.
Because there is a mutual exclusivity between CNFS and CES nodes, you need to accommodate user and
application access outage while CES clusters nodes are installed, configured, set up for authentication,
and the NFS exports are re-created. The duration of this process depends on the complexity of the
customer environment.
You might want to procure new CES nodes or reuse the existing CNFS nodes. Either way, you cannot use
the installation toolkit until the CNFS nodes are unconfigured.
If you do not have an opportunity to test or plan the implementation of a CES cluster elsewhere, you
might have to deal with the design and implementation considerations and issues during the planned
outage period. Usually this process is straightforward and quick. If you have a more complex
environment, however, it might take longer than the allotted upgrade window to complete the migration.
In this case, it is possible to set up one or two non-CNFS, NFS servers to serve NFS for a short time.
During this time, you would move all your CNFS IPs to these nodes as you decommission the CNFS
cluster. Then, after you successfully set up your CES nodes, authentication, and corresponding exports,
you can move the IPs from the temporary NFS servers over to the CES nodes.
You need to make a copy of the exports configuration file /etc/exports so that you can use this file as
the basis for creating the new exports in CES NFS. CES NFS exports configuration needs to be created by
using the mmnfs export add command or created in bulk by using the mmnfs export load command.
[root@esnode3 ~]#
5. Consider de-refencing the GPFS variable cnfsSharedRoot, although this step is not a requirement.
6. You can now delete the /etc/exports file on each of the CNFS nodes. Ensure that you have a backup
copy of this file to use as a reference when you create the exports under CES NFS.
7. Run systemctl disable nfs to ensure kNFS does not start automatically.
ExportCfgFile contains a listing of all your exports as defined in the format that is used for
/etc/ganesha/gpfs.ganesha.exports.conf.
5. Alternately, you can manually re-create each export on the CES cluster by using the mmnfs command.
mmnfs export add Path --client ClientOptions
6. Before you proceed to configure CES nodes, remove the NFS exports from /etc/exports from each of
the old CNFS nodes
7. Add the IPs that were previously assigned to CNFS to the address pool to be managed by CES by
using the following command:
mmces address add –-node node1Name –-ces-ip ipAddress
See “CES network configuration” on page 468 for details about how to use this command.
For more information on creating protocol data exports, see Fileset considerations for creating protocol data
exports in IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
Test and verify that you have the same level of access to the NFS exports as you did on CNFS to ensure
that your applications and NFS clients can continue without further changes.
GPFS uses 32-bit ID namespace as the canonical namespace, and Windows SIDs are mapped into this
namespace as needed. Two different mapping algorithms are used (depending on system configuration):
v GPFS built-in auto-generated mapping
v User-defined mappings stored in the Microsoft Windows Active Directory using the Microsoft Identity
Management for UNIX (IDMU) component
Auto-generated ID mappings
Auto-generated ID mappings are the default. If no explicit mappings are created by the system
administrator in the Active Directory using Microsoft Identity Management for UNIX (IDMU), all
mappings between security identifiers (SIDs) and UNIX IDs will be created automatically using a
reserved range in UNIX ID space.
Note: If you have a mix of GPFS running on Windows and other Windows clients accessing the
integrated SMB server function, the ability to share data between these clients has not been tested or
validated. With protocol support, the SMB server may also be configured to automatically generate ID
mapping. If you want to ensure that SMB users do not access data (share ID mapping) with Windows
users, ensure that the automatic range for SMB server is different from this range. The range of IDs
automatically generated for the SMB server can be controlled by mmuserauth.
Unless the default reserved ID range overlaps with an ID already in use, no further configuration is
needed to use the auto-generated mapping function. If you have a specific file system or subtree that are
only accessed by user applications from Windows nodes (even if AIX or Linux nodes are used as NSD
servers), auto-generated mappings will be sufficient for all application needs.
The default reserved ID range used by GPFS starts with ID 15,000,000 and covers 15,000,000 IDs. The
reserved range should not overlap with any user or group ID in use on any AIX or Linux nodes. To
change the starting location or the size of the reserved ID range, use the following GPFS configuration
parameters:
sidAutoMapRangeLength
Controls the length of the reserved range for Windows SID to UNIX ID mapping.
sidAutoMapRangeStart
Specifies the start of the reserved range for Windows SID to UNIX ID mapping.
Note: For planning purposes, remember that auto-generated ID mappings are stored permanently with
file system metadata. A change in the sidAutoMapRangeStart value is only effective for file systems
created after the configuration change.
To add the IDMU service when Active Directory is running on Windows Server 2008, follow these steps:
1. Open Server Manager.
2. Under Roles, select Active Directory Domain Services.
3. Under Role Services, select Add Role Services.
4. Under the Identity Management for UNIX role service, select Server for Network Information
Services.
5. Click Next, then Install.
6. Restart the system when the installation completes.
Typically it is a good idea to configure all the required ID mappings before you mount a GPFS file
system for the first time. Doing so ensures that IBM Spectrum Scale stores only properly remapped IDs
on the disk. However, you can add or delete ID mappings at any time while a GPFS file system is
mounted. IBM Spectrum Scale checks for mapping changes every 60 seconds and uses updated mappings
immediately.
When you configure an IDMU mapping for an ID that is already recorded in file metadata, you must be
careful to avoid corrupting IDMU mappings and disrupting access to files. An auto-generated mapping
that is already stored in an access control list (ACL) on disk continues to map correctly to a Windows
SID. However, the SID is now mapped to a different UNIX ID. When you access a file with an ACL that
contains the auto-generated ID, the access appears to IBM Spectrum Scale to be an access by a different
user. Depending on the file access permissions, the ID might not be able to access files that were
previously accessible.
To restore proper file access for the affected ID, configure a new mapping and then rewrite the affected
ACL. Rewriting replaces the auto-generated ID with an IDMU-mapped ID. To determine whether the
ACL for a particular file contains auto-generated IDs or IDMU-mapped IDs, examine file ownership and
permission information from a UNIX node, for example by issuing the mmgetacl command.
1. Click Start > Administrative Tools > Active Directory Users and Computers.
2. To see a list of the users and groups in this domain, select the Users branch in the tree on the left
under the branch for your domain.
3. To open the Properties window for a user or group, double-click the user or group line. If IDMU is
set up correctly, the window includes a UNIX Attributes tab, as is shown in the following figure:
Note: The field is labeled “NIS Domain” rather than just “Domain” because the IDMU subsystem
was originally designed to support integration with the UNIX Network Information System (NIS).
IBM Spectrum Scale does not use NIS.
b. In the UID field, enter a user ID. For group objects, enter a GID. Entering this information creates
a bidirectional mapping between a UNIX ID and the corresponding Windows SID. To ensure that
all mappings are unique, IDMU does not allow you to use the same UID or GID for more than
one user or group.
Note: You can create mappings for some built-in accounts in the Builtin branch of the Active
Directory Users and Computers window.
c. You do not need to enter any information in the Primary group name/GID field. IBM Spectrum
Scale does not use it.
5. To close the Properties window, click OK.
For more information on AFM-based Async DR, see the topic AFM-based Asynchronous Disaster Recovery
(AFM DR) in the IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
Important: Our initial feedback from the field suggests that success of a disaster recovery solution
depends on administration discipline, including careful design, configuration and testing. Considering
this, IBM has decided to disable the Active File Management-based Asynchronous Disaster Recovery
feature (AFM DR) by default and require that customers deploying the AFM DR feature first review their
deployments with IBM Spectrum Scale development. You should contact IBM Spectrum Scale Support at
[email protected] to have your use case reviewed. IBM will help optimize your tuning parameters and
enable the feature. Please include this message while contacting IBM Support.
These limitations do not apply to base AFM support. These apply only to Async DR available with the
IBM Spectrum Scale Advanced Edition V4.2 and V4.1.1.
For more information, see Flash (Alert): IBM Spectrum Scale (GPFS) V4.2 and V4.1.1 AFM Async DR
requirement for planning.
Although an overview of the steps that need to be done is provided if performing these operations
manually, it is recommended to use the mmcesdr command because it automates DR setup, failover,
failback, backup, and restore actions. For more information about the mmcesdr command, see mmcesdr
command in IBM Spectrum Scale: Command and Programming Reference .
Ensure that the following prerequisites are met for the secondary cluster for disaster recovery in an IBM
Spectrum Scale with protocols.
v IBM Spectrum Scale is installed and configured.
v IBM Spectrum Scale code levels are the same on the primary and secondary clusters.
v IBM Spectrum Scale code levels are the same on every protocol node within a cluster.
v Cluster Export Services (CES) are installed and configured, and the shared root file system is defined.
v All protocols that are configured on the primary cluster are also configured on the secondary cluster.
v Authentication on secondary cluster is identical to the authentication on the primary cluster.
v All exports that need to be protected using AFM DR must have the same device and fileset name, and
the same fileset link point on the secondary cluster as defined on the primary cluster.
v IBM NFSv3 stack must be configured on home cluster for the AFM DR transport of data.
v No data must be written to exports on secondary cluster while cluster is acting only as a secondary
cluster, before a failover.
The following limitations apply for disaster recovery in an IBM Spectrum Scale cluster with protocols.
This example consists of three NFS exports, three SMB shares, one object fileset, and two unified file and
object access filesets that are also NFS exports. For the SMB and NFS exports, only two of each are
independent filesets. This allows an AFM-based Async DR (AFM DR) configuration. For simplification,
the filesets are named according to whether or not they were dependent or independent for the SMB and
NFS exports. The inclusion of dependent filesets as exports is to show the warnings that are given when
an export path is not an independent fileset link point.
Note: SMB and NFS exports must be named according to their fileset link point names for them to be
captured by the mmcesdr command for protocols cluster disaster recovery. For example, if you have a
fileset nfs-smb-combo, the NFS or the SMB export name must be GPFS_Path/nfs-smb-combo. If you use a
name in the fileset's subdirectory for the SMB or the NFS export (for example: GPFS_Path/nfs-smb-combo/
nfs1), the mmcesdr command does not capture that export.
NFS exports
v /gpfs/fs0/nfs-ganesha-dep
v /gpfs/fs0/nfs-ganesha1
v /gpfs/fs0/nfs-ganesha2
SMB shares
v /gpfs/fs0/smb1
v /gpfs/fs0/smb2
v /gpfs/fs0/smb-dep
To handle a possible node failure, you need to specify at least two nodes on each cluster to be gateway
nodes. To specify two nodes on the primary cluster as gateway nodes, use the command similar to the
following:
Using the example setup mentioned in “Example setup for protocols disaster recovery” on page 486, the
command to specify gateway nodes on the primary cluster is as follows:
# mmchnode -N clusternode-vm1,clusternode-vm2 --gateway
Tue Apr 28 20:59:01 MST 2015: mmchnode: Processing node clusternode-vm2
Tue Apr 28 20:59:01 MST 2015: mmchnode: Processing node clusternode-vm1
mmchnode: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
# Tue Apr 28 20:59:04 MST 2015: mmcommon pushSdr_async:
mmsdrfs propagation started
Similarly, you need to specify at least two nodes on the DR cluster as gateway nodes. Using the example
setup, the command to specify gateway nodes on the DR cluster is as follows:
# mmchnode -N clusternode-vm1,clusternode-vm2 --gateway
Tue Apr 28 20:59:49 MST 2015: mmchnode: Processing node clusternode-vm2
Tue Apr 28 20:59:49 MST 2015: mmchnode: Processing node clusternode-vm1
mmchnode: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
# Tue Apr 28 20:59:51 MST 2015: mmcommon pushSdr_async:
mmsdrfs propagation started
File to be used with secondary cluster in next step of cluster DR setup: /root//DR_Config
Note: In this command example, there are two exports that are not protected. During the
configuration step, any exports that are not protected through AFM DR generate a warning to the
standard output of the command.
2. Use the following command to transfer the DR configuration file from the primary cluster to the
secondary cluster.
scp /root//DR_Config clusternode-vm1:/root/
The system displays output similar to the following:
root@clusternode-vm1’s password:
DR_Config 100% 1551 1.5KB/s 00:00
3. On the secondary cluster, use the following command to create the independent filesets that is a part
of the pair of AFM DR filesets associated with those on the primary cluster. In addition to creating
filesets, this command also creates the necessary NFS exports.
mmcesdr secondary config --input-file-path /root/ --inband
The system displays output similar to the following:
Performing step 1/3, creation of independent filesets to be used for AFM DR.
Successfully completed step 1/3, creation of independent filesets to be used for AFM DR.
Performing step 2/3, creation of NFS exports to be used for AFM DR.
Successfully completed step 2/3, creation of NFS exports to be used for AFM DR.
Performing step 3/3, conversion of independent filesets to AFM DR secondary filesets.
Successfully completed step 3/3, conversion of independent filesets to AFM DR secondary filesets.
4. Ensure that all of the expected AFM DR pairs show as Active in the output of the mmafmctl command
and take corrective action if they do not.
# mmafmctl fs0 getstate
Fileset Name Fileset Target Cache State Gateway Node Queue Length Queue numExec
------------ -------------- ------------- ------------ ------------ -------------
nfs-ganesha1 nfs://9.11.102.210/gpfs/fs0/nfs-ganesha1 Active clusternode-vm2 0 4
nfs-ganesha2 nfs://9.11.102.211/gpfs/fs0/nfs-ganesha2 Active clusternode-vm1.tuc.stglabs.ibm.com 0 4
combo1 nfs://9.11.102.210/gpfs/fs0/combo1 Active clusternode-vm1.tuc.stglabs.ibm.com 0 7
combo2 nfs://9.11.102.211/gpfs/fs0/combo2 Active clusternode-vm2 0 66
smb1 nfs://9.11.102.210/gpfs/fs0/smb1 Active clusternode-vm1.tuc.stglabs.ibm.com 0 65
smb2 nfs://9.11.102.211/gpfs/fs0/smb2 Active clusternode-vm1.tuc.stglabs.ibm.com 0 4
Fileset Name Fileset Target Cache State Gateway Node Queue Length Queue numExec
------------ -------------- ------------- ------------ ------------ -------------
object_fileset nfs://9.11.102.211/gpfs/fs1/object_fileset Active clusternode-vm1.tuc.stglabs.ibm.com 0 95671
obj_sofpolicy1 nfs://9.11.102.211/gpfs/fs1/obj_sofpolicy1 Active clusternode-vm1.tuc.stglabs.ibm.com 0 27
obj_sofpolicy2 nfs://9.11.102.210/gpfs/fs1/obj_sofpolicy2 Active clusternode-vm1.tuc.stglabs.ibm.com 0 26
async_dr nfs://9.11.102.210/gpfs/fs1/.async_dr Active clusternode-vm1.tuc.stglabs.ibm.com 0 2751
With the addition of the new optional parameter --allowed-nfs-clients you can specify exactly which
clients are allowed to connect to the NFS transport exports that are created on the secondary cluster. This
parameter can be used for both inband and outband set up. Here is an example of the parameter being
used for an inband setup:
v On the primary cluster, run the following command to configure independent fileset exports as AFM
DR filesets and to back up configuration information:
mmcesdr primary config --output-file-path /root/ --ip-list "9.11.102.211,9.11.102.210" --rpo
15 --inband --allowed-nfs-clients --gateway-nodes
File to be used with secondary cluster in next step of cluster DR setup: /root//DR_Config
2. Use the following command to transfer the DR configuration file from the primary cluster to the
secondary cluster.
scp /root//DR_Config clusternode-vm1:/root/
The system displays output similar to the following:
root@clusternode-vm1’s password:
DR_Config 100% 2566 2.5KB/s 00:00
Transfer all data on primary cluster for fileset fs1:obj_sofpolicy2 to fileset fs1:obj_sofpolicy2 on
secondary cluster.
Transfer all data on primary cluster for fileset fs1:object_fileset to fileset fs1:object_fileset on
secondary cluster.
Fileset Name Fileset Target Cache State Gateway Node Queue Length Queue numExec
------------ -------------- ------------- ------------ ------------ -------------
nfs-ganesha1 nfs://9.11.102.210/gpfs/fs0/nfs-ganesha1 Active clusternode-vm2 0 4
nfs-ganesha2 nfs://9.11.102.211/gpfs/fs0/nfs-ganesha2 Active clusternode-vm1.tuc.stglabs.ibm.com 0 4
combo1 nfs://9.11.102.210/gpfs/fs0/combo1 Active clusternode-vm1.tuc.stglabs.ibm.com 0 7
combo2 nfs://9.11.102.211/gpfs/fs0/combo2 Active clusternode-vm2 0 66
smb1 nfs://9.11.102.210/gpfs/fs0/smb1 Active clusternode-vm1.tuc.stglabs.ibm.com 0 65
smb2 nfs://9.11.102.211/gpfs/fs0/smb2 Active clusternode-vm1.tuc.stglabs.ibm.com 0 4
Fileset Name Fileset Target Cache State Gateway Node Queue Length Queue numExec
Note: A state of Dirty is normal when data is actively being transferred from the primary cluster to
the secondary cluster.
When the primary cluster fails in an IBM Spectrum Scale cluster with protocols, you can fail over to the
secondary cluster and re-create the file export configuration.
On the secondary cluster, after the primary cluster fails, issue the following command.
mmcesdr secondary failover
When the primary cluster fails in an IBM Spectrum Scale cluster with protocols, you can fail over to the
secondary cluster and restore the file export configuration.
================================================================================
= If all steps completed successfully, please remove and then re-create file
= authentication on the DR cluster.
= Once this is complete, Protocol Cluster Failover will be complete.
================================================================================
2. Remove the file authentication on the secondary cluster after failover. Then add back the file
authentication before failover is considered to be complete and client operations can resume, but point
to the secondary cluster.
Note: The --input-file-path parameter is optional but it might be needed if access to the
configuration file is not available in the configuration fileset.
The system displays output similar to the following:
Performing failback to primary on all AFM DR protected filesets.
Successfully completed failback to primary on all AFM DR protected filesets
2. On the old primary cluster, use the following command one or more times until the amount of time it
takes to complete the operation is less than the RPO value that you have set.
mmcesdr primary failback --apply-updates --input-file-path "/root/"
The system displays output similar to the following:
Performing apply updates on all AFM DR protected filesets.
Longest elapsed time is for fileset fs1:object_fileset and is 0 Hrs. 25 Mins. 20 Secs.
Successfully completed failback update on all AFM DR protected filesets.
Depending on user load on the acting primary, this step may need to be performed again before stopping
failback.
Note: The --input-file-path parameter is optional but it might be needed if access to the
configuration file is not available in the configuration fileset.
3. On the secondary cluster (acting primary), quiesce all client operations.
Note: The --input-file-path parameter is optional but it might be needed if access to the
configuration file is not available in the configuration fileset.
5. On the old primary cluster, use the following command.
mmcesdr primary failback --stop --input-file-path "/root/"
The system displays output similar to the following:
Performing stop of failback to primary on all AFM DR protected filesets.
Successfully completed stop failback to primary on all AFM DR protected filesets.
Note: The --input-file-path parameter is optional but it might be needed if access to the
configuration file is not available in the configuration fileset.
6. On the old primary cluster, use the following command to restore configuration:
mmcesdr primary restore
The system displays output similar to the following:
Restoring cluster and enabled protocol configurations/exports.
Successfully completed restoring cluster and enabled protocol configurations/exports.
7. On the secondary cluster (acting primary), use the following command to convert it back to a
secondary cluster and associate it with the original primary cluster:
mmcesdr secondary failback --post-failback-complete
The system displays output similar to the following:
Performing step 1/2, converting protected filesets back into AFM DR secondary filesets.
Successfully completed step 1/2, converting protected filesets back into AFM DR secondary
filesets.
Performing step 2/2, restoring/recreating AFM DR-based NFS share configuration.
Successfully completed step 2/2, restoring/recreating AFM DR-based NFS share configuration.
================================================================================
= If all steps completed successfully, remove and then re-create file
= authentication on the Secondary cluster.
= Once this is complete, Protocol Cluster Failback will be complete.
================================================================================
Note: The --input-file-path parameter is optional but it might be needed if access to the
configuration file is not available in the configuration fileset.
Note: The --input-file-path parameter is optional but it might be needed if access to the
configuration file is not available in the configuration fileset.
Note: The --input-file-path parameter is optional but it might be needed if access to the
configuration file is not available in the configuration fileset.
3. On the secondary cluster (acting primary), quiesce all client operations.
4. On the old primary cluster, use the following command one more time.
mmcesdr primary failback --apply-updates --input-file-path "/root/"
The system displays output similar to the following:
Performing apply updates on all AFM DR protected filesets.
Longest elapsed time is for fileset fs1:object_fileset and is 0 Hrs. 0 Mins. 27 Secs.
Successfully completed failback update on all AFM DR protected filesets.
Depending on user load on the acting primary, this step may need to be performed again before stopping
failback.
Note: The --input-file-path parameter is optional but it might be needed if access to the
configuration file is not available in the configuration fileset.
5. On the old primary cluster, use the following command.
mmcesdr primary failback --stop --input-file-path "/root/"
The system displays output similar to the following:
Performing stop of failback to primary on all AFM DR protected filesets.
Successfully completed stop failback to primary on all AFM DR protected filesets.
Note: The --input-file-path parameter is optional but it might be needed if access to the
configuration file is not available in the configuration fileset.
6. On the old primary cluster, use the following command to restore configuration:
mmcesdr primary restore --file-config --restore
The system displays output similar to the following:
Restoring cluster and enabled protocol configurations/exports.
Successfully completed restoring cluster and enabled protocol configurations/exports.
================================================================================
= If all steps completed successfully, remove and then re-create file
= authentication on the Primary cluster.
= Once this is complete, Protocol Cluster Configuration Restore will be complete.
================================================================================
7. On the primary cluster, remove the file authentication and then add it again.
8. On the secondary cluster (acting primary), use the following command to convert it back to a
secondary cluster and associate it with the original primary cluster.
mmcesdr secondary failback --post-failback-complete --input-file-path /root --file-config --restore
The system displays output similar to the following:
Performing step 1/2, converting protected filesets back into AFM DR secondary filesets.
Successfully completed step 1/2, converting protected filesets back into AFM DR secondary
filesets.
================================================================================
= If all steps completed successfully, remove and then re-create file
= authentication on the Secondary cluster.
= Once this is complete, Protocol Cluster Failback will be complete.
================================================================================
Note: The --input-file-path parameter is optional but it might be needed if access to the
configuration file is not available in the configuration fileset.
9. On the secondary cluster, remove the file authentication and then add it again.
File to be used with new primary cluster in next step of failback to new primary cluster:
/root//DR_Config
2. Transfer the newly created DR configuration file to the new primary cluster.
scp /root//DR_Config clusternode-vm1:/root/
The system displays output similar to the following:
root@clusternode-vm1’s password:
DR_Config 100% 1996 2.0KB/s 00:00
3. On the new primary cluster, use the following command to create the independent filesets that will
receive the data transferred from the recovery snapshots.
mmcesdr primary failback --prep-outband-transfer --input-file-path "/root/"
The system displays output similar to the following:
Creating independent filesets to be used as recipients of AFM DR outband transfer of data.
Successfully completed creating independent filesets to be used as recipients of AFM DR outband
transfer of data.
Transfer data from recovery snapshots through outbound trucking to the newly created independent
filesets before proceeding to the next step.
4. Transfer data from within the recovery snapshots of the secondary cluster to the new primary
cluster.
Attention: When transferring files that need to also transfer GPFS extended attributes, extra steps
are required. This example uses standard rsync which does not transfer extended attributes.
5. On the new primary cluster, use the following command to convert the independent filesets to
primary filesets and generate a new DR configuration file that will be used on the primary cluster
for the next steps and then transferred to the secondary cluster to be used in a later step.
mmcesdr primary failback --convert-new --output-file-path /root/ --input-file-path /root/
The system displays output similar to the following:
Performing step 1/2, conversion of independent filesets into new primary filesets to be used for AFM DR.
Successfully completed step 1/2, failback to primary on all AFM DR protected filesets.
Performing step 2/2, creation of output file for remaining failback to new primary steps.
File to be used with new primary cluster in next step of failback to new primary cluster: /root//DR_Config
6. On the new primary cluster, use the following command.
mmcesdr primary failback --start --input-file-path "/root/"
The system displays output similar to the following:
Performing failback to primary on all AFM DR protected filesets.
Successfully completed failback to primary on all AFM DR protected filesets.
Note: The --input-file-path parameter is optional but it might be needed if access to the
configuration file is not available in the configuration fileset.
7. On the new primary cluster, use the following command one or more times until the amount of time
it takes to complete the operation is less than the RPO value that you have set.
mmcesdr primary failback --apply-updates --input-file-path "/root/"
The system displays output similar to the following:
Performing apply updates on all AFM DR protected filesets.
Longest elapsed time is for fileset fs1:obj_sofpolicy1 and is 0 Hrs. 45 Mins. 10 Secs.
Successfully completed failback update on all AFM DR protected filesets.
Depending on user load on the acting primary, this step may need to be performed again before
stopping failback.
8. On the secondary cluster (acting primary), quiesce all client operations.
9. On the new primary cluster, use the following command one more time.
mmcesdr primary failback --apply-updates --input-file-path "/root/"
The system displays output similar to the following:
Performing apply updates on all AFM DR protected filesets.
Longest elapsed time is for fileset fs1:obj_sofpolicy1 and is 0 Hrs. 0 Mins. 16 Secs.
Successfully completed failback update on all AFM DR protected filesets.
Depending on user load on the acting primary, this step may need to be performed again before
stopping failback.
Note: The --input-file-path parameter is optional but it might be needed if access to the
configuration file is not available in the configuration fileset.
10. On the new primary cluster, use the following command to stop the failback process and convert the
new primary filesets to read/write.
mmcesdr primary failback --stop --input-file-path "/root/"
The system displays output similar to the following:
Performing stop of failback to primary on all AFM DR protected filesets.
Successfully completed stop failback to primary on all AFM DR protected filesets.
11. On the new primary cluster, use the following command to restore the protocol and export services
configuration information.
mmcesdr primary restore --new-primary
Note: The --new-primary option must be used to ensure protocol configuration is restored correctly.
The system displays output similar to the following:
Restoring cluster and enabled protocol
configurations/exports. Successfully completed restoring cluster and enabled protocol
configurations/exports.
12. Transfer the updated DR configuration file from the new primary cluster to the secondary cluster.
scp /root//DR_Config clusternode-vm1:/root/
The system displays output similar to the following:
root@clusternode-vm1’s password:
DR_Config 100% 2566 2.5KB/s 00:00
================================================================================
= If all steps completed successfully, remove and then re-create file
= authentication on the Secondary cluster.
= Once this is complete, Protocol Cluster Failback will be complete.
================================================================================
File to be used with new primary cluster in next step of failback to new primary cluster:
/root//DR_Config
2. Transfer the newly created DR configuration file to the new primary cluster.
scp /root//DR_Config clusternode-vm1:/root/
The system displays output similar to the following:
root@clusternode-vm1’s password:
DR_Config 100% 1996 2.0KB/s 00:00
3. On the new primary cluster, use the following command to create the independent filesets that will
receive the data transferred from the recovery snapshots.
mmcesdr primary failback --prep-outband-transfer --input-file-path "/root/"
The system displays output similar to the following:
Creating independent filesets to be used as recipients of AFM DR outband transfer of data.
Successfully completed creating independent filesets to be used as recipients of AFM DR outband
transfer of data.
Transfer data from recovery snapshots through outbound trucking to the newly created independent
filesets before proceeding to the next step.
4. Transfer data from within the recovery snapshots of the secondary cluster to the new primary
cluster.
Attention: When transferring files that need to also transfer GPFS extended attributes, extra steps
are required. This example uses standard rsync which does not transfer extended attributes.
5. On the new primary cluster, use the following command to convert the independent filesets to
primary filesets and generate a new DR configuration file that will be used on the primary cluster
for the next steps and then transferred to the secondary cluster to be used in a later step.
mmcesdr primary failback --convert-new --output-file-path /root/ --input-file-path /root/
The system displays output similar to the following:
Performing step 1/2, conversion of independent filesets into new primary filesets to be used for AFM DR.
Successfully completed step 1/2, failback to primary on all AFM DR protected filesets.
Performing step 2/2, creation of output file for remaining failback to new primary steps.
Successfully completed step 2/2, creation of output file for remaining failback to new primary steps.
File to be used with new primary cluster in next step of failback to new primary cluster: /root//DR_Config
6. On the new primary cluster, use the following command.
mmcesdr primary failback --start --input-file-path "/root/"
The system displays output similar to the following:
Performing failback to primary on all AFM DR protected filesets.
Successfully completed failback to primary on all AFM DR protected filesets.
Note: The --input-file-path parameter is optional but it might be needed if access to the
configuration file is not available in the configuration fileset.
Note: The --input-file-path parameter is optional but it might be needed if access to the
configuration file is not available in the configuration fileset.
10. On the new primary cluster, use the following command to stop the failback process and convert the
new primary filesets to read/write.
mmcesdr primary failback --stop --input-file-path "/root/"
The system displays output similar to the following:
Performing stop of failback to primary on all AFM DR protected filesets.
Successfully completed stop failback to primary on all AFM DR protected filesets.
11. On the new primary cluster, use the following command to restore the protocol and export services
configuration information.
mmcesdr primary restore --new-primary --file-config --restore
Note: The --new-primary option must be used to ensure protocol configuration is restored correctly.
The system displays output similar to the following:
Restoring cluster and enabled protocol configurations/exports.
Successfully completed restoring cluster and enabled protocol configurations/exports.
================================================================================
= If all steps completed successfully, remove and then re-create file
= authentication on the Primary cluster.
= Once this is complete, Protocol Cluster Configuration Restore will be complete.
================================================================================
12. On the primary cluster, remove the file authentication and then add it again.
13. Transfer the updated DR configuration file from the new primary cluster to the secondary cluster.
scp /root//DR_Config clusternode-vm1:/root/
The system displays output similar to the following:
root@clusternode-vm1’s password:
DR_Config 100% 2566 2.5KB/s 00:00
14. On the secondary cluster, use the following command to register the new primary AFM IDs to the
independent filesets on the secondary cluster acting as part of the AFM DR pairs.
mmcesdr secondary failback --post-failback-complete --new-primary --input-file-path "/root"
--file-config --restore
The system displays output similar to the following:
================================================================================
= If all steps completed successfully, remove and then re-create file
= authentication on the Secondary cluster.
= Once this is complete, Protocol Cluster Failback will be complete.
================================================================================
15. On the secondary cluster, remove the file authentication and then add it again.
Note: The following steps only describe how to back up and restore the protocols and CES configuration
information. The actual data contained in protocol exports would need to be backed up and restored
separately.
1. On the primary cluster, use the following command to back up the configuration information:
mmcesdr primary backup
The system displays output similar to the following:
Performing step 1/2, configuration fileset creation/verification.
Successfully completed step 1/2, configuration fileset creation/verification.
Performing step 2/2, protocol and export services configuration backup.
Successfully completed step 2/2, protocol and export services configuration backup.
For backup, you can use IBM Spectrum Protect (formerly known as Tivoli Storage Manager) or some
other tool. For example, you can use mmbackup as follows:
mmbackup configuration_fileset_link_point --scope inodepace -t full
2. On the primary cluster, restore data from the off cluster storage into the configuration fileset. If
mmbackup was used to back up the configuration fileset, the IBM Spectrum Protect command to restore
is similar to the following:
dsmc restore -subdir=yes "configuration_fileset_link_point /*"
3. On the primary cluster, use the following command to restore the configuration information:
mmcesdr primary restore
The system displays output similar to the following:
Restoring cluster and enabled protocol configurations/exports.
Successfully completed restoring cluster and enabled protocol configurations/exports.
In some cases, running the mmcesdr primary restore command might display the following error
message:
Saved configuration file does not exist.. In this case, do the following:
v If this cluster is part of a Protocols DR relationship, place a copy of the DR configuration file at a
specified location and run the mmcesdr primary restore command again using the
--input-file-path option.
v If this cluster is not part of a Protocols DR relationship, run this command again with the
--file-config --restore option to force restoring the file configuration information. The system
displays output similar to the following:
# mmcesdr primary restore --file-config --restore
Restoring cluster and enabled protocol configurations/exports.
Successfully completed restoring cluster and enabled protocol configurations/exports.
Note: If you want to perform a restore as part of a failback (either to an old primary cluster or a new
primary cluster) and want to re-create the file configuration/exports, use one of the following
commands:
mmcesdr primary restore
or
mmcesdr primary restore --file-config --recreate.
You can use the following command to update the backed up configuration information for Object, NFS,
and SMB protocols, and CES.
mmcesdr primary update {--obj | --nfs | --smb | --ces}
Note: No output is generated by the command, because this command is designed to be scripted and
run on a regular basis or run as a part of a callback. The update command can only be used to update
the primary configuration after Protocols DR has been configured. If no secondary cluster exists, use the
mmcesdr primary backup command to back up configuration for later restore.
Use the following information to collect the data required for protocols cluster disaster recovery.
Note: IBM Spectrum Scale 4.2 and later versions for object storage supports either AFM DR-based
protection or multi-region object deployment , but not both. If multi-region object deployment is enabled,
no object data or configuration information is protected through protocols cluster DR.
In addition to the standard object fileset, independent filesets are created for each object policy that is
created. This in turn creates additional CCR files that are listed here. All of these additional filesets and
additional configuration information are protected, if IBM Spectrum Scale for object storage is using the
AFM DR-based protection and not multi-region object deployment.
You can determine all object filesets using the mmobj policy list command.
The object related files in CCR that need to be backed up are as follows:
1. account.builder
2. account.ring.gz
3. account-server.conf
4. container.builder
The following CCR files also need to be backed up for local object authentication :
v keystone.conf
v keystone-paste.ini
v logging.conf
v wsgi-keystone.conf
For a list of object authentication related CCR files and variables that need to be backed up, see
“Authentication related data required for protocols cluster DR” on page 511.
Failover steps for object configuration if you are using local authentication for
object
Use the following steps on a protocol node in the secondary cluster to fail over the object configuration
data. Use this set of steps if you are using local authentication for object.
Important:
The following object steps must be run on the node designated as object_database_node in the secondary
cluster. This ensures that postgresql-obj and Keystone servers can connect during this configuration
process.
1. Stop the object protocol services using the following command:
mmces service stop OBJ --all
2. Make two changes in the preserved Cluster Configuration Repository (CCR) configuration to update
it for the DR environment:
a. The keystone.conf file: Edit this file to change the database connection address to
object_database_node of the secondary cluster. For example:
Change this:
[database]
connection = postgresql://keystone:[email protected]/keystone
to this:
[database]
connection = postgresql://keystone:[email protected]/keystone
Note: If the mmcesdr command is used to save the protocol cluster configuration, then the
preserved copy of the keystone.conf file is located at the following location:
CES_shared_root_mount_point/.async_dr/Failover_Config/Object_Config/latest/ccr_files/
keystone.conf
You can edit the file directly to make this change or use the openstack-config command. For
example, first retrieve the current value using get, and then update it using the set option:
openstack-config --get keystone.conf database connection \
postgresql://keystone:[email protected]/keystone
Note: If the mmcesdr command is used to save the protocol cluster configuration, then the
preserved copy of the ks_dns_name variable is located as a line in the following file:
CES_shared_root_mount_point/.async_dr/Failover_Config/Object_Config/latest/ccr_vars/
file_for_ccr_variables.txt
Change the value of the variable in this preserved copy of the file.
504 IBM Spectrum Scale 5.0.2: Administration Guide
3. [Optional] If spectrum-scale-localRegion.conf exists from CCR, change the cluster hostname and
cluster_id properties to the cluster host name and cluster id as shown in the output of the
mmlscluster command.
4. Restore the Postgres database information to the shared root directory. The directory needs to be first
cleaned out before the archive is restored. This can be done with commands similar to the following,
assuming that the directory was tar/zip when it was backed up:
a. Delete the old Postgres data:
rm -rf <shared_root_location>/object/keystone/*
b. Verify that the shared root directory is empty:
ls <shared_root_location>/object/keystone
c. Restore the current Postgres database:
tar xzf <tar_file_name>.gz -C <shared_root_location>
d. Delete the process status file from the primary:
rm -rf <shared_root_location>/object/keystone/postmaster.pid
e. List the Postgres files:
ls <shared_root_location>/object/keystone
5. Restore all object configuration CCR files except objRingVersion, including keystone.conf with the
modification for object_database_node, with a command similar to the following:
mmccr fput <file> <location>/<file>
6. If object policies are present, restore all of the object policy related CCR files.
7. Restore the object configuration CCR file objRingVersion.
8. Restore all object configuration CCR variables, including ks_dns_name with the modification for
cluster host name, with a command similar to the following:
mmccr vput name value
9. Start the Postgres database and verify that it is running successfully using commands similar to the
following:
systemctl start postgresql-obj
sleep 5
systemctl status postgresql-obj
Note: If SSL is enabled, SSL certificates must be in place when you are saving keystone.conf from
another cluster.
13. If the DEFAULT admin_token is set, save its current value by using a command similar to the
following:
openstack-config --get /etc/keystone/keystone.conf DEFAULT admin_token
If a value is returned from the above command, save it because it will need to be restored later.
14. In the keystone.conf file, set admin_token to ADMIN using the openstack-config command as follows.
openstack-config --set /etc/keystone/keystone.conf DEFAULT admin_token ADMIN
15. Set the following environment variables.
export OS_TOKEN=ADMIN # The value from admin_token
export OS_URL="http://127.0.0.1:35357/v3"
16. Start Keystone services and get the list of endpoint definitions using commands similar to the
following.
systemctl start httpd
sleep 5
openstack endpoint list
These commands generate output similar to:
+------------+--------+--------------+--------------+---------+-----------+--------------------------------------------------------------|
| ID | Region | Service Name | Service Type | Enabled | Interface | URL |
+------------+--------+--------------+--------------+---------+-----------+--------------------------------------------------------------|
| c36e..9da5 | None | keystone | identity | True | public | http://specscaleswift.example.com:5000/ |
| f4d6..b040 | None | keystone | identity | True | internal | http://specscaleswift.example.com:35357/ |
| d390..0bf6 | None | keystone | identity | True | admin | http://specscaleswift.example.com:35357/ |
| 2e63..f023 | None | swift | object-store | True | public | http://specscaleswift.example.com:8080/v1/AUTH_%(tenant_id)s |
| cd37..9597 | None | swift | object-store | True | internal | http://specscaleswift.example.com:8080/v1/AUTH_%(tenant_id)s |
| a349..58ef | None | swift | object-store | True | admin | http://specscaleswift.example.com:8080 |
+------------+--------+--------------+--------------+---------+-----------+--------------------------------------------------------------|
17. Update the host name specified in the endpoint definitions in the URL value from the endpoint list.
The values in the endpoint table might have the cluster host name (ces1 in this example) from the
primary system. They need to be updated for the cluster host name in the DR environment. In some
environments, the cluster host name is the same between the primary and secondary clusters. If that
is the case, skip this step.
a. Delete the existing endpoints with the incorrect cluster host name. For each of the endpoints, use
the ID value to delete the endpoint. For example, use a command similar to this to delete each of
the six endpoints:
openstack endpoint delete e149
b. Recreate the endpoints with the cluster host name of the secondary cluster using the following
commands:
The CHN variable in the following commands is the cluster host name for the secondary cluster.
openstack endpoint create identity public "http://$CHN:5000/v3"
openstack endpoint create identity internal "http://$CHN:35357/v3"
openstack endpoint create identity admin "http://$CHN:35357/v3"
openstack endpoint create object-store public "http://$CHN:8080/v1/AUTH_%(tenant_id)s"
openstack endpoint create object-store internal "http://$CHN:8080/v1/AUTH_%(tenant_id)s"
openstack endpoint create object-store admin "http://$CHN:8080"
c. Verify that the endpoints are now using the correct cluster host name using the following
command:
openstack endpoint list
18. If the api_v3 pipeline had to be updated previously, return it to its original value by running the
following command: mmobj config change --ccrfile keystone-paste.ini --section
pipeline:api_v3 --property pipeline --value <savedAPI_V3Pipeline>
, where <savedAPI_V3Pipeline> is the value of the api_v3 pipeline saved above.
Failover steps for object configuration if you are not using local authentication for object:
Use the following steps on a protocol node on a secondary cluster to fail over object configuration data.
Use this set of steps if you are not using local authentication for object.
1. Stop the object protocol services using the following command:
mmces service stop OBJ --all
2. Restore all object configuration CCR files except objRingVersion with a command similar to the
following:
mmccr fput <file> <location>/<file>
3. If object policies are present, restore all of the object policy related CCR files (that is *.builder and
*.ring.gz files).
4. Restore the object configuration CCR file objRingVersion.
5. Restore all object configuration CCR variables with a command similar to the following:
mmccr vput name value
6. Start the object protocol services using the following command:
mmces service start OBJ --all
You need to determine the object_database_node on the primary cluster once repaired or replaced.
v Object database node: This is the CES IP address which is configured to run the postgresql-obj
database for object services. You can find this value as the address designated as the
object_database_node in the output of the mmces address list command. For example:
mmces address list
Important:
The following object steps must be run on the node designated as object_database_node in the primary
cluster. This ensures that postgresql-obj and Keystone servers can connect during this configuration
process.
You can determine the SMB shares using the mmsmb export list command.
The SMB protocol related files that need to be backed up are as follows.
v account_policy.tdb
v autorid.tdb1
v group_mapping.tdb
v passdb.tdb
v registry.tdb
v secrets.tdb
v share_info.tdb
v ctdb.tdb
Note: 1 This file is required only if the file authentication is configured with Active Directory.
The private Kerberos configuration files available at the following location also need to be backed up:
/var/lib/samba/smb_krb5/. You can copy these files from this location and save them.
If the NFS exports are independent filesets, AFM based Disaster Recovery (AFM DR) can be used to
replicate the data.
The NFS protocol related CCR files that need to be backed up are as follows.
v gpfs.ganesha.main.conf
v gpfs.ganesha.nfsd.conf
v gpfs.ganesha.log.conf
v gpfs.ganesha.exports.conf
v gpfs.ganesha.statdargs.conf
The following NFS protocol related CCR variable needs to be backed up.
v nextexportid
The following authentication related CCR file needs to be backed up for disaster recovery.
v authccr
Depending on the file authentication scheme you are using, additional files need to be backed up.
The object authentication related files in CCR that need to be backed up are as follows:
v keystone.conf
v keystone-paste.ini
v logging.conf
v wsgi-keystone.conf
v ks_ext_cacert.pem
v keystone_ssl.tar
v authccr
The object authentication related variables in CCR that need to be backed up are as follows:
v OBJECT_AUTH_TYPE
v PREV_OBJECT_AUTH_TYPE
This variable may not be present if the authentication type has not changed.
v OBJECT_IDMAPDELETE
v ks_db_type
Chapter 34. Protocols cluster disaster recovery 511
v ks_db_user
v ks_dns_name
Note: The object authentication does not need to be removed and re-added as part of failover, failback, or
restore.
1. Save the current file authentication information on the secondary cluster.
2. Remove file authentication from the secondary cluster.
3. Restore file authentication on the secondary cluster based on the information saved in step 1.
Note: The object authentication does not need to be removed and re-added as part of failover, failback, or
restore.
1. Save the current file authentication information on the primary cluster.
2. Remove file authentication from the primary cluster.
3. Restore file authentication on the primary cluster based on the information saved in step 1.
Use the following steps on a protocol node in the primary cluster and the secondary cluster to restore the
authentication configuration.
Note: The object authentication does not need to be removed and re-added as part of failover, failback, or
restore.
1. Save the current file authentication information on the primary cluster.
2. Remove file authentication from the primary cluster.
3. Restore file authentication on the primary cluster based on the information saved in step 1.
4. Save the current file authentication information on the secondary cluster.
5. Remove file authentication from the secondary cluster.
6. Restore file authentication on the secondary cluster based on the information saved in step 4.
Cluster Configuration Repository (CCR) files that need to be backed up for CES in a disaster recovery
scenario are as follows:
v mmsdrfs
v cesiplist
For sharing nothing cluster, there are two typical configurations: replica-based Spectrum Scale (sharing
nothing cluster) and Spectrum Scale FPO.
If you do not run any workloads that could benefit from data locality (for example, SAP HANA +
Spectrum Scale for X86_64 machines, Hadoop, Spark, IBM DB2® DPF or IBM DashDB etc), you should
not configure sharing nothing cluster as Spectrum Scale FPO. For such workloads, you just need to
configure replica-based IBM Spectrum Scale. Otherwise, you could configure it as IBM Spectrum Scale
FPO (File Placement Optimizer). For Spectrum Scale FPO, you could control the replica location in the file
system.
When you create the storage pool over sharing nothing cluster, if you configure allowWriteAffinity=yes
for the storage pool, you enable data locality for the data stored in the storage pool and this is called as
FPO mode. If you configure allowWriteAffinity=no for the storage pool, this is called as replica-based
sharing nothing mode. After the file system is created, the storage pool property allowWriteAffinity
cannot be modified further.
In this chapter, all data locality related concepts (for example, allowWriteAffinity, Chunks, Extended
failure groups, Write affinity failure group, Write affinity depth) are only effective for IBM Spectrum Scale
FPO mode. For others concepts in this chapter, replica-based sharing nothing cluster is applicable.
Note: This feature is available with IBM Spectrum Scale Standard Edition or higher.
Note: In fileset level, Write affinity depth of 2 is design to assign (write) all the files in a fileset to
the same second-replica node. However, this behavior depends on node status in the cluster.
After a node is added to or deleted from a cluster, a different node might be selected as the
second replica for files in a fileset.
See the description of storage pool stanzas that follows. Also, see the following command
descriptions in the IBM Spectrum Scale: Command and Programming Reference:
v mmadddisk
v mmchattr
v mmcrfs
v mmchpolicy
v mmapplypolicy
v mmchpool
Write affinity failure group
Write affinity failure group is a policy that indicates the range of nodes (in a shared nothing
architecture) where replicas of blocks in a particular file are to be written. The policy allows the
application to determine the layout of a file in the cluster to optimize for typical access patterns.
You specify the write affinity failure group through the write-affinity-failure-group
WafgValueString attribute of the mmchattr command. You can also specify write affinity failure
group through the setWADFG attribute of the mmchpolicy and mmapplypolicy command.
Failure group topology vector ranges specify the nodes, and the specification is repeated for each
replica of the blocks in a file.
For example, the attribute 1,1,1:2;2,1,1:2;2,0,3:4 indicates:
v The first replica is on rack 1, rack location 1, nodes 1 or 2.
v The second replica is on rack 2, rack location 1, nodes 1 or 2.
v The third replica is on rack 2, rack location 0, nodes 3 or 4.
The default policy is a null specification. This default policy indicates that each replica must
follow the storage pool or the file-write affinity depth (WAD) definition for data placement. Not
wide striped over all disks.
Note: To change the failure group of a disk in a write-affinity–enabled storage pool, you must
use the mmdeldisk and mmadddisk commands. You cannot use mmchdisk to change it directly.
See the following command descriptions in the IBM Spectrum Scale: Command and Programming
Reference:
v mmchpolicy
v mmapplypolicy
v mmchattr
Enabling the FPO features
To efficiently support write affinity and the rest of the FPO features, GPFS internally requires the
creation of special allocation map formats. When you create a storage pool that is to contain files
that make use of FPO features, you must specify allowWriteAffinity=yes in the storage pool
stanza.
To enable the policy to read from preferred replicas, issue one of the following commands:
v To specify that the policy read from the first replica, regardless of whether there is a replica on
the disk, default to or issue the following:
mmchconfig readReplicaPolicy=default
v To specify that the policy read replicas from the local disk, if the local disk has data, issue the
following:
mmchconfig readReplicaPolicy=local
v To specify that the policy read replicas from the fastest disk to read from based on the disk's
read I/O statistics, run the following:
mmchconfig readReplicaPolicy=fastest
Note: In an FPO-enabled file system, if you run data locality awareness workload over FPO, such
as Hadoop or Spark, configure readReplicaPolicy as local to read data from the local disks to
reduce the network bandwidth consumption.
See the description of storage pool stanzas that follows. Also, see the following command
descriptions in the IBM Spectrum Scale: Command and Programming Reference:
v mmadddisk
v mmchconfig
v mmcrfs
Storage pool stanzas
Storage pool stanzas are used to specify the type of layout map and write affinity depth, and to
enable write affinity, for each storage pool.
Storage pool stanzas have the following format:
%pool:
pool=StoragePoolName
blockSize=BlockSize
usage={dataOnly | metadataOnly | dataAndMetadata}
layoutMap={scatter | cluster}
allowWriteAffinity={yes | no}
writeAffinityDepth={0 | 1 | 2}
blockGroupFactor=BlockGroupFactor
See the following command descriptions in the IBM Spectrum Scale: Command and Programming
Reference:
v mmadddisk
Ideally, all the failure groups must have an equal number of disks with roughly equal capacity. If one
failure group is much smaller than the rest, it is likely to fill up faster than the others, and this
complicates rebalancing actions.
After the initial ingesting of data, the cluster might be unbalanced. In such a situation, use the
mmrestripefs command with the -b option to rebalance the data.
Note: For FPO users, the mmrestripefs -b command breaks the original data placement that follows the
data locality rule.
When a file is synced from home to cache, it follows the same FPO placement rule as when written from
the gateway node in the cache cluster. When a file is synced from cache to home, it follows the same FPO
data placement rule as when written from the NFS server in the home cluster.
To retain the same file placement at AFM home and cache, ensure that each has the same cluster
configuration and set the write affinity failure group for each file. If the home and cache cluster have
different configurations, such as the disk number, node number, or fail group, then the data locality
might be broken.
Configuring FPO
Follow the steps listed in the IBM Spectrum Scale: Concepts, Planning, and Installation Guide to install the
IBM Spectrum Scale RPMs and build the portability layer on all nodes in the cluster. You can configure
password-less SSH for root user across all IBM Spectrum Scale nodes. However, in cases of special
security control, you can configure at least one node for the root user to access all IBM Spectrum Scale
nodes in a password-less mode. IBM Spectrum Scale commands can be run only over these nodes.
For OS with Linux kernel 2.6, enter following commands on all IBM Spectrum Scale nodes that are set as
root to set vm.min_free_bytes :
# TOTAL_MEM=$(cat /proc/meminfo | grep MemTotal | tr -d \"[:alpha:]\" | tr -d \"[:punct:]\" | tr
-d \"[:blank:]\") # VM_MIN_FREE_KB=$((${TOTAL_MEM}*6/100))
# echo "vm.min_free_kbytes = $VM_MIN_FREE_KB" >> /etc/sysctl.conf # sysctl -p
# sysctl -a | grep vm.min_free_kbytes
Create the IBM Spectrum Scale cluster with node11 as the primary and node21 as the secondary cluster
configuration server. Set the -A flag to automatically start GPFS daemons when the OS is started.
# mmcrcluster -A -C gpfs-cluster -p node11 -s node21 -N nodefile -r $(which ssh) -R $(which scp)
All IBM Spectrum Scale nodes require a license designation before they can be used. The FPO feature
introduced a dedicated PFS license class fpo. In a IBM Spectrum Scale FPO cluster, all quorum and
manager nodes require a server license. Based on the sample environment, node11, node21, and node31
require a server license. The other nodes require an fpo license.
# mmchlicense server --accept -N node11,node21,node31
# mmchlicense fpo --accept -N node12,node13,node14,node15,node16,node22,node23,node24,node25,node26,
node32,node33,node34,node35,node36
Use the mmlslicense -L command to view license information for the cluster.
Start the IBM Spectrum Scale cluster to verify whether it starts successfully. Use the mmstartup –a
command to start the IBM Spectrum Scale cluster and the mmgetstate–a command to view the state of
the IBM Spectrum Scale cluster.
Note: In the example, /dev/sda is not included because this is the OS disk.
If MapReduce intermediate and temporary data is stored on etx3/ext4 disks instead of IBM Spectrum
Scale, make sure those disks are not included in the disk file or IBM Spectrum Scale will format
them and include them in the IBM Spectrum Scalecluster
– System pool disks:
- Should have usage=metadataOnly. It is possible to use usage=dataAndMetadata if there is a reason
to have data on the system pool disks. The block size of the dataAndMetadata system pool must
be the same as the block size of a data pool in the file system.
- failureGroup must be a single number if allowWriteAffinity is not enabled (specify
allowWriteAffinity=no for system pool definition when doing mmcrnsd or mmcrfs) and it should
be the same for all disks on the same node. If allowWriteAffinity is enabled for system pool, the
failure group can be of format rack,position,node, for example, 2,0,1; or, it can take the traditional
single-number failure group format also.
- Even when allowWriteAffinity is enabled for system pool, the metadata does not follow data
locality rules; these rules apply only to data placement
– Data pool disks:
- Must have usage=dataOnly.
- failureGroup must be of the format [rack,position,node], where position is either 0 or 1 to
represent top or bottom half of the rack. The sample environment does not have half racks, so the
same position is used for all nodes. Especially, when position and node fields are ignored in the
cluster, the failure group can be defined as a single number, [rack, -,-].
Example of NSD disk file created by using the mmcrnsd command:
# gpfstest9
%nsd: nsd=node9_meta_sdb device=/dev/sdb servers=gpfstest9 usage=metadataOnly failureGroup=1 pool=system
#gpfstest10
%nsd: nsd=node10_meta_sda device=/dev/sda servers=gpfstest10 usage=metadataOnly failureGroup=2 pool=system
#gpfstest11
If any disks are previously used by IBM Spectrum Scale, you must use the -v no flag to force IBM
Spectrum Scale to use them again.
Note: Use the -v no flag only if you are sure that the disk can be used by IBM Spectrum Scale.
Use the # mmcrnsd -F diskfile [-v no] command to create NSDs and use the mmlsnsd –m command to
display the NSDs.
Set the IBM Spectrum Scale page pool to 25% of system memory on each node. For Hadoop noSQL
application, the page pool of IBM Spectrum Scale FPO can be configured for better performance, for
example, 30% of physical memory.
In this example, all nodes have the same amount of memory, which is a best practice. If some nodes have
different memory, set the page pool on a per-node basis by using the -N flag.
# TOTAL_MEM=$(cat /proc/meminfo | grep MemTotal | tr -d \"[:alpha:]\" | tr -d\"[:punct:]\" | tr -d \"[:blank:]\")
# PAGE_POOL=$((${TOTAL_MEM}*25/(100*1024)))
# mmchconfig pagepool=${PAGE_POOL}M
Use the mmlsconfig and mmdiag commands to see the configuration changes:
# mmlsconfig
# mmdiag --config
To use FPO, a single file system is recommended. The following example creates a file system with
mount point /mnt/gpfs that is set to auto mount. This mount point is used in Hadoop configuration later.
The replication for both data and metadata is set to 3 replicas. Quotas are not activated on this file
system. An inode size of 4096 is recommended for typical MapReduce data sizes \-S and -E settings help
For more information on the pool configuration, see “Create IBM Spectrum Scale Network Shared Disks
(NSD)” on page 521.
Use the mmdf command to view the disk usage for the file system:
# mmdf gpfs-fpo-fs
After you create the rule file, use the mmchpolicy command to enable the policy:
# mmchpolicy gpfs-fpo-fs policyfile -I yes
Use the mmlspolicy command to display the currently active rule definition:
# mmlspolicy gpfs-fpo-fs –L
Note: If MapReduce intermediate and temporary data is not stored on IBM Spectrum Scale,
mapred.cluster.local.dir in MRv1 or yarn.nodemanager.log-dirs and yarn.nodemanager.local-dirs in
Hadoop Yarn does not point to a IBM Spectrum Scale directory, you do not need to go through this
section.
Use the mmcrfileset command to create two filesets, one for local intermediate data and one for
temporary data:
After the fileset is created, it must be linked to a directory under this IBM Spectrum Scale file system
mount point. This example uses /mnt/gpfs/mapred/local for intermediate data and /mnt/gpfs/tmp for
temporary data. As /mnt/gpfs/mapred/local is a nested directory, the directory structure must exist
before linking the fileset. These two directories are required for configuring Hadoop.
# mkdir -p $(dirname /mnt/gpfs/mapred/local)
# mmlinkfileset gpfs-fpo-fs mapred-local-fileset -J /mnt/gpfs/mapred/local
# mmlinkfileset gpfs-fpo-fs mapred-tmp-fileset -J /mnt/gpfs/tmp
The next step to setting up the filesets is to apply a IBM Spectrum Scale policy so the filesets act like local
directories on each node. This policy instructs IBM Spectrum Scale not to replicate the data for these two
filesets, and since these filesets are stored on the data pool, they can use FPO features that keeps local
writes on local disks. Metadata must still be replicated three times, which can result in performance
overhead. File placement policies are evaluated in the order they are entered, so ensure that the policies
for the filesets appear before the default rule.
# cat policyfile
rule ’R1’ SET POOL ’datapool’ REPLICATE (1,1) FOR FILESET (’mapred-local-fileset’)
rule ’R2’ SET POOL ’datapool’ REPLICATE (1,1) FOR FILESET (’mapred-tmp-fileset’)
rule default SET POOL ’datapool’
# mmchpolicy gpfs-fpo-fs policyfile -I yes
Use the mmlspolicy command to display the currently active rule definition:
# mmlspolicy gpfs-fpo-fs –L
In each of these filesets, create a subdirectory for each node that run Hadoop jobs. Based on the sample
environment, this script creates these subdirectories:
# cat mk_gpfs_local_dirs.sh
#!/bin/sh for nodename in $(mmlsnode -N all); do
mkdir -p /mnt/gpfs/tmp/${nodename}
mkdir -p /mnt/gpfs/mapred/local/${nodename}
done
To check that the rules are working properly, you can write some test files and verify their replication
settings. For example:
To make sure that MapReduce jobs can write to the IBM Spectrum Scale file system, assign permissions to
the CLUSTERADMIN user. CLUSTERADMIN is the user who starts Hadoop namenode and datanode
service, for example, user hdfs.
# chown -R CLUSTERADMIN:CLUSTERADMINGROUP /mnt/gpfs
# chmod -R +rx /mnt/gpfs
Note: If the max_sectors_kb of your disks is small (e.g. 256 or 512) and you are not allowed to tune
the above values (i.e., you get an “invalid argument” as per the example above), then your disk
performance might be impacted because IBM Spectrum Scale IO requests might be split into several
smaller requests according to the limits max_sectors_kb places at the block device level.
As discussed in Step 1 tuning recommendations, any tuning done by echoing to sysfs files will be lost
when a node reboots. To make such a tuning permanent, either create appropriate udev rules or place
these commands in a boot file that is run on each reboot.
As udev rules are the preferred way of accomplishing this kind of block device tuning, give an
example of a generic udev rule that enables the block device tuning recommended in steps 1 and 2
for all block devices. This rule can be enabled by creating the following rule as a file
/etc/udev/rules.d/100-hdd.rules):
ACTION=="add|change", SUBSYSTEM=="block", ATTR{device/model}=="*",
ATTR{queue/nr_requests}="256", ATTR{device/queue_depth}="32",
ATTR{queue/max_sectors_kb}="16384"
If it is not desirable to tune all block devices with the same settings, multiple rules can be created
with specific tuning for the appropriate devices. To create such device specific rules, you can use the
‘KERNEL’ match key to limit which devices udev rules apply to (e.g., KERNEL==sdb). The following
example script can be used to create udev rules that tune only the block devices used by IBM
Spectrum Scale:
#!/bin/bash
#clean up any existing /etc/udev/rules.d/99-hdd.rules files
/usr/lpp/mmfs/bin/mmdsh -N All "rm -f /etc/udev/rules.d/100-hdd.rules"
#collect all disks in use by GPFS and create udev rules one disk at a time
/usr/lpp/mmfs/bin/mmlsnsd -X | /bin/awk ’ { print $3 " " $5 } ’ | \
/bin/grep dev |
while read device node ; do
device=$(echo $device | /bin/sed ’s/\/dev\///’ )
echo $device $node
echo "ACTION==\"add|change\", SUBSYSTEM==\"block\", \
KERNEL==\"$device\", ATTR{device/model}==\"*\", \
ATTR{queue/nr_requests}=\"256\", \
ATTR{device/queue_depth}=\"32\", ATTR{queue/max_sectors_kb}=\"16384\" "> \
Note: The previous example script must be run from a node that has ssh access to all nodes in the
cluster. This previous example script will create udev rules that will set the recommended block
device tuning on future reboots. To put the recommended tuning values from steps 1 and 2 in place
immediately in effect, the following example script can be used:
#!/bin/bash
/usr/lpp/mmfs/bin/mmlsnsd -X | /bin/awk ’ { print $3 " " $5 } ’ | \
/bin/grep dev |
while read device node ; do
device=$(echo $device | /bin/sed ’s/\/dev\///’ )
/usr/lpp/mmfs/bin/mmdsh -N $node "echo deadline >\
/sys/block/$device/queue/scheduler"
/usr/lpp/mmfs/bin/mmdsh -N $node "echo 16384>\
/sys/block/$device/queue/max_sectors_kb"
/usr/lpp/mmfs/bin/mmdsh -N $node "echo 256 >\
/sys/block/$device/queue/nr_requests"
/usr/lpp/mmfs/bin/mmdsh -N $node "echo 32 >\
/sys/block/$device/device/queue_depth"
Done
3. disk cache checking
On clusters that do not run Hadoop/Spark workloads, disks used by IBM Spectrum Scale must have
physical disk write caching disabled, regardless of whether RAID adapters are used for these disks.
When running other (non-Hadoop/Spark) workloads, write caching on the RAID adapters can be
enabled if the local RAID adapter cache is battery protected, but the write cache on the physical disks
must not be enabled.
Check the specification for your RAID adapter to figure out how to turn on/off the RAID adapter
write cache, as well as the physical disk write cache.
For common SAS/SATA disks without RAID adapter, run the following command to check whether
the disk in question is enabled with physical disk write cache:
sdparm --long /dev/<diskname> | grep WCE
If WCE is 1, it means the disk write cache is on.
The following commands can be used to turn on/off physical disk write caching:
# turn on physical disk cache
sdparm -S -s WCE=1 /dev/<diskname>
# turn off physical disk cache
sdparm -S -s WCE=0 /dev/<diskname>
Note: The physical disk read cache must be enabled no matter what kind of disk is used. For
SAS/SATA disks without RAID adapters, run the following command to check whether the disk read
cache is enabled or not:
sdparm --long /dev/<diskname> | grep RCD
If the value of RCD (Read Cache Disable) is 0, the physical disk read cache is enabled. On Linux,
usually the physical disk read cache is enabled by default.
4. Tune vm.min_free_kbytes to avoid potential memory exhaustion problems.
When vm.min_free_kbytes is set to its default value, in some configurations it is possible to encounter
memory exhaustion symptoms when free memory must be available. It is recommended that
vm.min_free_kbytes be set to between 5~6 percent of the total amount of physical memory, but no
more than 2GB should be allocated for this reserve memory.
To tune this value, add the following into /etc/sysctl.conf and then run 'sysctl -p' on Red Hat or
SuSE:
vm.min_free_kbytes = <your-min-free-KBmemory>
Note: The first instance (copy) of the data is referred to as the first replica. For example, setting the
DefaultDataReplicas=1 (via '-d 1' option to mmcrfs) results in only a single copy of each piece of
data, which is typically not desirable for a shared-nothing environment.
Query the number of replicas kept for any given file system by running the command:
/usr/lpp/mmfs/bin/mmlsfs <filesystem_name> | egrep " -r| -m"
Change the level of data and metadata replication for any file system by running mmchfs by using
the same -r (DefaultDataReplicas) and -m (DefaultMetadataReplicas) flags to change the default
replication options and then mmrestripefs (with the -R flag) to restripe the file system to match the
new default replication options.
For example:
/usr/lpp/mmfs/bin/mmchfs <filesystem_name> -r <NewDefaultDataReplicas> -m
<NewDefaultDataReplicas>
/usr/lpp/mmfs/bin/mmrestripefs <filesystem_name> -R
2. Additional considerations for the file system:
For more information, see the topics mmchfs command and mmcrfs command in the IBM Spectrum Scale:
Command and Programming Reference.
3. Define the data and the metadata distribution across the NSD server nodes in the cluster:
Ensure that clusters larger than 4 nodes are not defined with a single (dataAndMetadata) system
storage pool.
For performance and RAS reasons, it is recommended that data and metadata be separated in some
configurations (which means that not all the storage is defined to use a single dataAndMetadata
system pool).
These guidelines focus on the RAS considerations related to the implications of losing metadata
servers from the cluster. In IBM Spectrum Scale Shared Nothing configurations (which recommend
setting the unmountOnDiskFail=meta option), a given file system is unmounted when the number of
nodes experiencing metadata disk failures is equal to or greater than the value of the
DefaultMetadataReplicas option defined for the file system (the -m option to the mmcrfs command as
per above). So, for a file system with the typically configured value DefaultMetadataReplicas=3, the
file system will unmount if metadata disks in three separate locality group IDs fail (when a node
fails, all the internal disks in that node will be marked down).
Note: All the disks in the same file system on a given node must have the same locality group ID.
The Locality ID refers to all three elements of the extended failure group topology vector (For
example, the vector 2,1,3 could represent rack 2, rack position 1, node 3 in this portion of the rack).
To avoid file system unmounts associated with losing too many nodes serving metadata, it is
recommended that the number of metadata servers be limited when possible. Also metadata servers
must be distributed evenly across the cluster to avoid the case of a single hardware failure (such as
the loss of a frame/rack or network switch) leading to multiple metadata node failures.
Some suggestions for separation of data and metadata based on cluster size:
Depending on the available disks it is optional: both the data and metadata can be stored on each
disk in this configuration (in which case, the NSDs will all be defined as dataAndMetadata) or the
disks can be specifically allocated for data or metadata.
If the disk number per node is equal to or less than 3, define all the disks as dataAndMetadata.
If the disk number per node is larger than 3, take the 1:3 ratio for metadataOnly disk and dataOnly
disk if your applications are meta data IO sensitive. If your applications are not metadata IO sensitive,
consider using 1 metadataOnly disk.
6-9 5 nodes must serve as metadata disks.
Assign one node per virtual rack where each node is one unique failure group. Among these nodes,
select 5 nodes with metadata disks and other nodes with data-only disks.
For metadata disk number, if you are not considering IOPS for metadata disk, you can select one disk
as metadata NSD from the above 5 nodes with metadata disks. Other disks from these 5 nodes are
used for data disks. if considering IOPS for metadata disk, you could select 1:3 ratio for
metadata:data.
For example, if you have 8 nodes with 10 disks per node, you have totally 81 disks. However, if you
are considering 1:3 ratio, you could have 20 disks for metadata and select 5 disks per node from the
above 5 nodes as metadata NSD disks. All other disks are configured as data NSD.
10 - 19 There are several different layouts, for example, 2 nodes per virtual rack for 10-node cluster; for
20-node cluster, you can take every 4 nodes per virtual rack or every 2 nodes per virtual rack; for
15-node cluster, you can take every 3 nodes per virtual rack.
You must keep at least 5 failure groups for meta data and data. This can ensure that you have enough
failure groups for data restripe when you have failures from 2 failure groups.
To make it simple, it is suggested that every 2 nodes are defined as a virtual rack, with the first
element of the extended failure group kept the same for nodes in the same virtual rack, and every
virtual rack must have a node with metadata disks defined.
For example, for an 18-node cluster, node1~node18, node1 and node2 are considered as a virtual rack.
You can select some disks from node1 as metadataOnly disks and other disks from node1 and all
disks from node2 as dataOnly disks. Ensure that these nodes are in the same failure group (for
example, all dataOnly disks from node1 are of failure group 1,0,1, all dataOnly disks from node2 are
of failure group 1,0,2).
20 or more Usually, it is recommended that the virtual rack number be greater than 4 but less than 32 with each
rack of the same node number. Each rack will be defined as one unique failure group and you will
have 5+ failure groups that can tolerate failures from 2 failure groups for data restripe. Select one
node from each rack to serve as meta data node.
For example, for a 24-node cluster, you can split the clusters into 6 virtual racks with 4 nodes per
rack. For 21-node cluster, it is recommended to take 7 virtual racks with 3 nodes per rack. For node
number larger than 40, as a starting point, it's recommended that approximately every 10 nodes
should be defined as a virtual rack, with the first element of the extended failure group kept the same
for nodes in the same virtual rack. As for meta data, every virtual rack should have one node with
metadataOnly disks defined. if you have more than 10+ racks, you could only select 5~10 virtual
racks configured with metadata disks.
As for how many disks must be configured as metadataOnly disks on the node which is selected for
metadataOnly disks, this depends on the exact disk configuration and workloads. For example, if you
configure one SSD per virtual rack, defining the SSD from each virtual rack as metadataOnly disks
will work well for most workloads.
Note: The Linux buffer pool cache is not used for IBM Spectrum Scale file systems. The
recommended size of the pagepool attribute depends on the workload and the expectations for
improvements due to caching. A good starting point recommendation is somewhere between 10%
and 25% of real memory. If machines with different amounts of memory are installed, use the -N
option to mmchconfig to set different values according to the memory installed on the machines in
the cluster. Though these are good starting points for performance recommendations, some
customers use relatively small page pools, such as between 2-3% of real memory installed,
particularly for machines with more than 256GB installed.
The following example shows how to set a page pool size equal to 10% of the memory (this assumes
all the nodes have the same amount of memory installed):
TOTAL_MEM=$(cat /proc/meminfo | grep MemTotal | tr -d \"[:alpha:]\" | tr -d
\"[:punct:]\" | tr -d \"[:blank:]\")
PERCENT_OF_MEM=10
PAGE_POOL=$((${TOTAL_MEM}*${PERCENT_OF_MEM}/(100*1024)))
mmchconfig pagepool=${PAGE_POOL}M –i
9. Change the following IBM Spectrum Scale configuration options and then restart IBM Spectrum
Scale.
Note: For IBM Spectrum Scale 4.2.0.3 or 4.2.1 and later, the restart of IBM Spectrum Scale can be
delayed until the next step, because tuning workerThreads will require a restart.
Set each configuration option individually:
mmchconfig readReplicaPolicy=local
mmchconfig unmountOnDiskFail=meta
mmchconfig restripeOnDiskFailure=yes
mmchconfig nsdThreadsPerQueue=10
mmchconfig nsdMinWorkerThreads=48
mmchconfig prefetchaggressivenesswrite=0
mmchconfig prefetchaggressivenessread=2
| For versions of IBM Spectrum Scale earlier than 5.0.2, also set one of the following values:
| mmchconfig maxStatCache=512
| mmchconfig maxStatCache=0
| In versions of IBM Spectrum Scale earlier than 5.0.2, the stat cache is not effective on the Linux
| platform unless the Local Read-Only Cache (LROC) is configured. For more information, see the
| description of the maxStatCache parameter in the topic mmchconfig command in the IBM Spectrum
| Scale: Command and Programming Reference.
Set all the configuration options at once by using the mmchconfig command:
mmchconfig readReplicaPolicy=local,unmountOnDiskFail=meta,
restripeOnDiskFailure=yes,nsdThreadsPerQueue=10,nsdMinWorkerThreads=48,
prefetchaggressivenesswrite=0,prefetchaggressivenessread=2
| For versions of IBM Spectrum Scale earlier than 5.0.2, also include one of the following expressions:
| maxStatCache=512 or maxStatCache=0.
The maxMBpS tuning option must be set as per the network bandwidth available to IBM Spectrum
Scale. If you are using one 10 Gbps link for the IBM Spectrum Scale network traffic, the default
value of 2048 is appropriate. Otherwise scale the value of maxMBpS to be about twice the value of
the network bandwidth available on a per node basis.
For example, for two bonded 10 Gbps links an appropriate setting for maxMBpS is:
mmchconfig maxMBpS=4000 # this example assumes a network bandwidth of about
2GB/s (or 2 bonded 10 Gbps links) available to Spectrum Scale
Note: For IBM Spectrum Scale 4.2.0.3 or 4.2.1 or later, it is recommended that the following
configuration parameters not be changed (setting workerThreads to 512, or (8*cores per node),
will auto-tune these values): parallelWorkerThreads, logWrapThreads, logBufferCount,
maxBackgroundDeletionThreads, maxBufferCleaners, maxFileCleaners, syncBackgroundThreads,
syncWorkerThreads, sync1WorkerThreads, sync2WorkerThreads, maxInodeDeallocPrefetch,
flushedDataTarget, flushedInodeTarget, maxAllocRegionsPerNode, maxGeneralThreads,
worker3Threads, and prefetchThreads.
After you enable auto-tuning by tuning the value of workerThreads, if you previously changed
any of these settings (parallelWorkerThreads, logWrapThreads, etc) you must restore them back
to their default values by running mmchconfig <tunable>=Default.
b. For IBM Spectrum Scale 4.1.0.x, 4.1.1.x, 4.2.0.0, 4.2.0.1, 4.2.0.2, the default values will work for
most scenarios. Generally only worker1Threads tuning is required:
mmchconfig worker1Threads=72 -i # for Spectrum Scale 4.1.0.x, 4.1.1.x,
4.2.0.0, 4.2.0.1, 4.2.0.2
For IBM Spectrum Scale 4.1.0.x, 4.1.1.x, 4.2.0.0, 4.2.0.1, 4.2.0.2, worker1Threads=72 is a good
starting point (the default is 48), though larger values have been used in database environments
and other configurations that have many disks present.
11. Customers running IBM Spectrum Scale 4.1.0, 4.1.1, and 4.2.0 must change the default configuration
of trace to run in overwrite mode instead of blocking mode.
To avoid potential performance problems, customers running IBM Spectrum Scale 4.1.0, 4.1.1, and
4.2.0 must change the default IBM Spectrum Scale tracing mode from blocking mode to overwrite
mode as follows:
/usr/lpp/mmfs/bin/mmtracectl --set --trace=def --tracedev-writemode=
overwrite --tracedev-overwrite-buffer-size=500M # only for Spectrum
Scale 4.1.0, 4.1.1, and 4.2.0
This assumes that 500MB can be made available on each node for IBM Spectrum Scale trace buffers.
If 500MB are not available, then set a lower appropriately sized trace buffer.
12. Consider whether pipeline writing must be enabled.
By default, data ingestion node writes 2 or 3 replicas of the data to the target nodes over the
network in parallel when pipeline writing is disabled (enableRepWriteStream=0). This takes
additional network bandwidth. If pipeline writing is enabled, the data ingestion node only writes
one replica over the network and the target node writes the additional replica. Enabling pipeline
writing (mmchconfig enableRepWriteStream=1 and restarting IBM Spectrum Scale daemon on all
nodes) can increase IO write performance in the following two scenarios:
a. Data is ingested from the IBM Spectrum Scale client and the network bandwidth from the
data-ingesting client is limited.
The following configurations can be changed by using mmchconfig as per the needs of the system
workload:
Default
Configuration Value Recommended Comment
forceLogWriteOnFdatasync Yes No
disableInodeUpdateOnFdatasync No Yes
dataDiskCacheProtectionMethod 0 2 Change this to 2 if you turn on
dataOnly disk write cache (without
battery protection).
For Hadoop-like workloads, one JVM process can open a lot of files. Therefore, tune the ulimit values:
vim /etc/security/limits.conf
# add the following lines at the end of /etc/security/limits.conf
* soft nofile 65536
* hard nofile 65536
* soft nproc 65536
* hard nproc 65536
kernel.pid_max
Usually, the default value is 32K. If you see the error allocate memory or unable to create new native
thread, try to increase kernel.pid_max by adding kernel.pid_max=99999 at the end of /etc/sysctl.conf
and then sysctl -p.
Default
Configuration Value Recommended Comment
enableLinuxReplicatedAio N/A Yes The default value depends on the release of IBM
Spectrum Scale.
preStealPct 1 See below Only for Direct I/O.
Database workload customers using direct I/O must also enable the following preStealPct tuning
depending on the IBM Spectrum Scale levels:
v 3.5 (any PTF level)
v 4.1.1 (below PTF 10)
v 4.2.0 (any PTF level)
v 4.2.1 (below PTF 2).
The database workload customers with direct I/O enabled who are running older code levels must tune
preStealPct as follows:
echo 999 | mmchconfig preStealPct=0 -i
After upgrading to IBM Spectrum Scale from one of the previously referenced older code levels to a
higher level (especially 4.1.1 PTF 10, 4.2.1 PTF 2, or 4.2.2.0 or higher), you can set the configuration
option preStealPct=0 to its default value as follows:
echo 999 | mmchconfig preStealPct=1 -i
It is possible that even after employing the above techniques to ingest, the cluster might become
unbalanced as nodes and disks are added or removed. You can check whether the data in the cluster is
balanced by using the mmdf command. If data disks in different nodes are showing uneven disk usage,
rebalance the cluster by running the mmrestripefs -b command. Keep in mind that the rebalancing
command causes additional I/O activity in the cluster. Therefore, plan to run it at a time when workload
is light.
If data must be exported into another IBM Spectrum Scale cluster, the AFM function can be used to
replicate data into a remote IBM Spectrum Scale cluster.
Upgrading FPO
When the application that runs over the cluster can be stopped, you can shut down the entire GPFS
cluster and upgrade FPO. However, if the application over the cluster cannot be stopped, you need to
take the rolling-upgrade procedure to upgrade nodes.
Prerequisites
v Ensure that all disks are in a ready status and up availability. You can check by issuing the mmlsdisk
fs-name -L command.
v Verify whether the upgraded-to GPFS version is compatible with the running version from IBM
Spectrum Scale FAQ in IBM Knowledge Center (www.ibm.com/support/knowledgecenter/STXKQY/
gpfsclustersfaq.html). For example, you cannot upgrade GPFS from 3.4.0.x directly into 3.5.0.24. You
need to upgrade to 3.5.0.0 first and then upgrade to the latest PTF. You also need to verify whether the
operating system kernel version and the Linux distro version are compatible with GPFS from IBM
Spectrum Scale FAQ in IBM Knowledge Center (www.ibm.com/support/knowledgecenter/STXKQY/
gpfsclustersfaq.html).
v Find a time period when the whole system work load is low or reserve a maintenance time window to
do the upgrade. When cluster manager or file system manager is down intentionally or accidentally,
another node is elected to take the management role. But it takes time to keep the cluster configuration
and the file system data consistent.
v When a file system manager is elected by cluster manager, it does not change even if the file system is
unmounted in this node. If the file system is mounted in other nodes, it is also ‘internal’ mounted in
the file system manager. This does not affect your ability to unload the kernel modules and upgrade
GPFS without a reboot.
When these disks are in "ready" status and if some of these disks are in “down” availability, you can
start these disks through the following command:
mmchdisk <fsName> start -a or
mmchdisk <fsName> start -d <diskList>
This might take a while since GPFS must do incremental data sync up to keep all data in these
suspended disks are up to date. The time it needs depends on how much data has changed when
the disks were kept in suspended status. You have to wait for mmchdisk start command to finish to
do next step.
Confirming all disks are in ready status and up state through command:
mmlsdisk <fsName>
11. Mounts GPFS file system
When all disks in the file system are in up status you can mount file system:
The SNMP agent software consists of a master agent and a set of subagents, which communicate with the
master agent through an agent/subagent protocol, the AgentX protocol in this case.
The SNMP subagent runs on a collector node of the IBM Spectrum Scale cluster. The collector node is
designated by the system administrator by using the mmchnode command.
The Net-SNMP master agent, also called as the SNMP daemon, or snmpd, must be installed on the
collector node to communicate with the IBM Spectrum Scale subagent and with your SNMP management
application. Net-SNMP is included in most Linux distributions and must be supported by your Linux
vendor.
For more information about enabling SNMP support, see the GPFS SNMP support topic in the IBM
Spectrum Scale: Problem Determination Guide.
Refer to the GPFS SNMP support topic in the IBM Spectrum Scale: Administration Guide for further
information about enabling SNMP support.
When you install IBM Spectrum Scale, you can enable IBM Spectrum Scale monitoring using the IBM
BigInsights installation program. If the monitoring was not enabled at the time of installation, it can be
done later by installing the Net-SNMP master agent on the collector node to communicate with the IBM
Spectrum Scale subagent and the IBM BigInsights Console. Detailed instructions are provided in the
Enabling monitoring for GPFS topic in the IBM InfoSphere® BigInsights Version 2.1.2 documentation.
Rolling upgrades
During a regular upgrade, the IBM Spectrum Scale service is interrupted. For a regular upgrade, you
must shut down the cluster and suspend the application workload of the cluster. During a rolling
upgrade, there is no interruption in the IBM Spectrum Scale service. In a rolling upgrade, the system is
upgraded node by node or failure group by failure group. During the upgrade, IBM Spectrum Scale runs
on a subset of nodes.
In a rolling upgrade, nodes from the same failure group must be upgraded at the same time. If nodes
from two or more failure groups stop functioning, only a single data copy is available online. Also, if the
quorum node stops functioning, the quorum relationship in the cluster is broken. Therefore, the quorum
node must be excluded from the rolling upgrade of the failure node.
After the node is rebooted, the disk status of the node is uncertain. The status of the node is dependent
upon the auto recovery configuration (mmlsconfig restripeOnDiskFailure) and the IO operations over
the cluster.
Note: In a large cluster, some nodes might take a while to start. If the -A option is not set to no,
unnecessary disk IO might cause some disks from slow nodes to be marked as non functional.
5. Shut down IBM Spectrum Scale on the nodes by running the following command:
mmshutdown -N <nodeList>
To confirm IBM Spectrum Scale has stopped functioning on these nodes, run the following
command: mmgetstate -a
6. Upgrade IBM Spectrum Scale or perform the maintenance procedure on the whole cluster.
7. Start IBM Spectrum Scale cluster. After everything has been installed and the portability layer has
been built, start IBM Spectrum Scale by running the following command: mmgetstate -a.
To confirm that IBM Spectrum Scale is active on the upgraded nodes, run the following command:
mmgetstate -a.
8. When IBM Spectrum Scale is active on all nodes, check the state of all disks by running the
following command: mmlsdisk <fsName> -e. If some disks in the file system do not have the Up
availability and the Ready status, run the mmchdisk <fsName> start -a command so that the disks
start functioning. Run the mmchdisk <fsName> resume -a command so that the suspended and
to-be-emptied disks become available.
9. When all the disks in the file system are functioning, mount the file system by running the following
command: mmmount <fsName> -N <nodeList>
Confirm that the IBM Spectrum Scale file system has mounted by running the following command:
mmlsmount <fsName> -L
10. To enable auto recovery for disk failure, run the following command: mmchconfig
restripeOnDiskFailure=yes -i
Ensure that you use the -i option so that this change takes effect immediately and permanently.
11. To enable the Automatic mount option, run the following command: mmchfs <fsName> -A yes.
12. If you have upgraded IBM Spectrum Scale version in step 6, upgrade the IBM Spectrum Scale cluster
version and file system version.
If all applications run without any issues, run the mmchconfig release=LATEST command to upgrade
the cluster version to the latest. Then, run the mmchfs -V compat command to ensure that the
upgrade is successful. To enable backward-compatible format changes, run mmchfs -V compat.
Note: After running the mmchconfig release=LATEST command, you cannot revert the cluster release
version to an older version. After running the mmchfs -V compat command, you cannot revert the file
system version to an older version.
For major IBM Spectrum Scale upgrade, check IBM Spectrum Scale FAQ in IBM Knowledge Center
(www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html) or contact
[email protected] before running the mmchfs -V full command to verify the compatibility between
the different IBM Spectrum Scale major versions. For information about specific file system format
and function changes, see Chapter 14, “File system format changes between versions of IBM
Spectrum Scale,” on page 161.
To check the node state, issue the mmgetstate command with or without the -a option, as in the following
examples:
1) mmgetstate
2) mmgetstate -a
Be aware of the differences between the down, unknown, and unresponsive states:
v A node in the down state is reachable but the GPFS daemon on the node is not running or is recovering
from an internal error.
v A node in the unknown state cannot be reached from the node on which the mmgetstate command was
run.
v A node in the unresponsive state is reachable but the GPFS daemon on the node is not responding.
To follow up on investigating the state of a node, check if the node is functioning or has a network issue.
For more information, see the topic mmgetstate command in the IBM Spectrum Scale: Command and
Programming Reference.
To check the state of the disks in a IBM Spectrum Scale cluster, run mmlsdisk -e command. This
command lists all the disks that do not have the Available or Up status.
The IBM Spectrum Scale log files are saved in the /var/adm/ras/ directory on each node. Each time the
IBM Spectrum Scale daemon starts, a new log file is created. The mmfs.log.latest log file is the link to the
latest log. On Linux, all additional information is sent to the system log in /var/log/messages.
Because the IBM Spectrum Scale cluster manager and file system managers handle cluster issues such as
node leaves or disk down events, monitor the IBM Spectrum Scale log on the cluster manager and file
system manager to get the best view of the cluster and file system status.
Disk Failures
This section describes how to handle a disk failure.
In an FPO deployment model with IBM Spectrum Scale the restripeOnDiskFailure=yes configuration
parameter should be set to yes. When a disk is not functioning, auto recovery enables the disk to start
functioning. Auto recovery enlists the help of any node in the cluster to help recover data. This may
affect the file system I/O performance on all nodes, because data might have to be copied from a valid
disk to recover the disk.
| In the following example, the tsrestripefs process is running in the back end (line 6) and its
| command ID is #92 (line 5):
| # mmfsadm command list all
| CrHashTable 0x7F7E64001A08 n 4
| cmd sock 75 cookie 3489916426 owner 12912 id 0x2D7ADC0785000064(#100) uses 1 type 14 start 1531294737.470181
| flags 0x106 SG none line ’command list all’
| cmd sock 70 cookie 2102087586 owner 4450 id 0x2D7ADC078500005C(#92) uses 1 type 13 start 1531294660.218091
| flags 0x117 SG fpofs line ’tsrestripefs /dev/fpofs -r’
| hold PIT/repair waitTime 6.082489
| 3) If a back-end process is running, issue the following command to stop it:
| mmfsadm command stop <commandID>
| where <commandID> is the command ID of the back-end process from the previous step. The
| following example uses command ID 92 from the example in the previous step:
| mmfsadm command stop 92
| 4) Run the mmfsadm command again to verify that the process is no longer running:
| mmfsadm command list all
Deleting disks when auto recovery is not enabled (check this by mmlsconfig
restripeOnDiskFailure):
Deleting NSD disks from the file system can trigger disk or network traffic because of data protection. If
your cluster is busy with application IO and the application IO performance is important, schedule a
maintenance window to delete these broken disks from your file system. Follow the steps in the “Starting
the disk failure recovery” on page 547 section to check if a disk is physically broken and handle the
broken disks.
When the IO operation is being performed on the physically broken disks, IBM Spectrum Scale marks the
disks as non functional. Auto recovery suspends the disks if it fails to change the availability of the disk
to Up and restripes the data off the suspended disks. If you are using IBM Spectrum Scale 4.1.0.4 or
earlier, deleting the non functional disks triggers heavy IO traffic (especially for metadata disks). On IBM
Spectrum Scale 4.1.0.4, mmdeldisk command has been improved. If the data on non functional disks have
been restriped, the disk status will be Emptied. The mmdeldisk command deletes the non functional disks
with the Emptied status without involving additional IO traffic.
Node failure
In an FPO deployment, each node has locally attached disks. When a node fails or has a connection
problem with other nodes in a cluster, disks in this node become unavailable. Reboot a node to repair a
hardware issue or patch the operating system kernel. Both these cases are node failures.
If you want to reboot a node or enable some configuration change that requires a reboot and have it
recovered without auto recovery, check the auto recovery wait time. The auto recovery wait time is
defined by the minimum value of minDiskWaitTimeForRecovery, metadataDiskWaitTimeForRecovery and
dataDiskWaitTimeForRecovery. By default, minDiskWaitTimeForRecovery is 1800 seconds,
metadataDiskWaitTimeForRecovery is 2400 seconds and dataDiskWaitTimeForRecovery is 3600 seconds. If
the reboot is completed within the auto recovery wait time, it is safe to unmount the file system, shut
down IBM Spectrum Scale, and reboot your node without having to disable auto recovery.
When you want to perform hardware maintenance for a node that must be shut down for a long time,
follow the same steps mentioned in IBM Spectrum Scale Rolling Upgrade Procedure and perform
hardware maintenance.
However, if one file system has only two failure groups for metadata or data with default replica two, or
if one file system has only 3 failure groups for metadata or data with default replica 3, auto recovery
must be disabled (mmchconfig restripeOnDiskFailure=no -N all) in IBM Spectrum Scale 4.1.x, 4.2.x and
5.0.0. The issue is fixed in IBM Spectrum Scale 5.0.1 and later.
Usually, if the concurrent failed nodes are less than maxFailedNodesForRecovery, auto recovery will
protect data against node failure or disk failure. If the concurrent failed nodes are larger than
maxFailedNodesForRecovery, auto recovery exits without any action and the administrator has to take
some actions to recover it.
With unmountOnDiskFail configured as meta, if you see file system SGPanic reported when nodes are
non functional, there are more than three nodes with metadata disk down together or there are more than
three disks with meta data down. Follow the steps in the section 8.1 to fix the issues. Run the mmfsck -n
command to scan the file system to ensure that mmfsck displays the following message: File system is
clean finally. If mmfsck -n does not report “File system is clean”, you need to open PMR to report the
issues and fix this with guide from IBM Spectrum Scale.
In an FPO cluster, if Auto recovery is enabled and there are more than maxFailedNodesForRecovery non
functional nodes, auto recovery does not recover the nodes. By default, maxFailedNodesForRecovery is
three nodes. You can change this number depending on your cluster configuration.
A switch network failure can cause nodes to be reported as non functional. If you want auto recovery to
protect against switch network failures, careful planning is required in setting up the FPO cluster. For
example, a network switch failure must not bring disks (with metadata) down from 3 or more failure
groups, and maxFailedNodesForRecovery must be configured to a value that is larger than the number of
down nodes that will result from a switch network failure.
Data locality
In an FPO cluster, if the data storage pool is enabled with allowWriteAffinity=yes, the data locality is
decided by the following order:
v WADFG is set by mmchattr or the policy.
v Default WAD or WAD is set by policy and the data ingesting node.
If the file is set with WADFG, the locality complies with WADFG independent of where the data is
ingested. If the file is not set with WADFG, the locality is decided according to the WAD and
data-ingesting node. Also, data locality configurations are the required configurations. If there are no
disks available to comply with the configured data locality, the IBM Spectrum Scale FPO stores the data
in other disks.
All disks in a node must be configured as the same failure or locality group. After a disk is
nonfunctional, mmrestripefs -r from auto recovery suspends the disk and restripes the data on the
nonfunctional disks onto other disks in the same locality group. The data locality is not broken because
the data from local disks is still in that node. If you do not have other disks available in the same locality
group, mmrestripefs –r from auto recovery restripes the data on the nonfunctional disks onto other
nodes, breaking the data locality for the applications running over that node.
If the file is not set with WADFG (by policy or by mmchattr), both mmrestripefile -b and mmrestripefs
-b might break the data locality.
If the file is not set with WADFG (by policy or by mmchattr), mmrestripefs -l might break the data
locality. The node running mmrestripefile -l is considered as the data writing node and all first replica
of data is stored in the data writing node for an FPO-enabled storage pool.
The following sections describe the steps to check if your data locality is broken and how fix it if needed.
Perform the following steps to check the data locality for IBM Spectrum Scale releases:
v For IBM Spectrum Scale 4.2.2.0 and earlier, run /usr/lpp/mmfs/samples/fpo/tsGetDataBlk.
v For IBM Spectrum Scale 4.2.2.x, run /usr/lpp/mmfs/samples/fpo/mmgetlocation.
v For IBM Spectrum Scale 4.2.3, mmgetlocation supports the -Y option.
You can refer the output from /usr/lpp/mmfs/samples/fpo/mmgetlocation about the options. You can run
/usr/lpp/mmfs/samples/fpo/mmgetlocation -f <absolute-file-path> to get the block location of the
<absolute-file-path>. Also, you can run /usr/lpp/mmfs/samples/fpo/mmgetlocation -d
<absolute-dir-path> to get the block location summary of <absolute-dir-path>.
[FILE INFO]
------------------------------------------------------------------------
blockSize 1024 KB
blockGroupFactor 128
metadataBlockSize 131072K
writeAffinityDepth 1
flags:
data replication: 2 max 2
storage pool name: fpodata
metadata replication: 2 max 2
[SUMMARY INFO]
----------------------------------------------------------------------------------------------------------
Replica num Nodename TotalChunkst
The summary at the end of the output shows that, for the file /sncfs/file1G, 8 chunks of the first replica
are located on the node c8f2n04. The 8 chunks of the second replica are located on the c8f2n05 node.
For IBM Spectrum Scale 4.2.2.0 and earlier, perform the following steps to get the block location of files.
cd /usr/lpp/mmfs/samples/fpo/
g++ -g -DGPFS_SNC_FILEMAP -o tsGetDataBlk -I/usr/lpp/mmfs/include/ tsGetDataBlk.C -L/usr/lpp/mmfs/lib/ -lgpfs
./tsGetDataBlk <filename> -s 0 -f <data-pool-block-size * blockGroupFactor> -r 3
In the above example, the block size of data pool is 2 Mbytes, the blockGroupFactor of the data pool is
128. So, the META_BLOCK (or chunk) size is 2MB * 128 = 256Mbytes. Each output line represents one
chunk. For example, Block 0 in the above is located in the disks with disk id 2, 4 and 6 for 3 replica.
To know the node on which the three replicas of Block 0 are located, check the mapping between disk ID
and nodes:
Check the mapping between disks and nodes by mmlsdisk (the 9th column is the disk id of NSD) and
mmlsnsd:
[root@gpfstest2 sncfs]# mmlsdisk sncfs –L
disk driver sector failure holds holds avail- storage
name type size group metadata data status ability disk id pool remarks
------------ -------- ------ ----------- -------- ----- ------- --------- ------- --------- ---------
node1_sdb nsd 512 1 Yes No ready up 1 system desc
node1_sdc nsd 512 1,0,1 No Yes ready up 2 datapool
The three replicas of Block 0 are located in disk ID 2 (NSD name node1_sdc, node name is
gpfstest1.cn.ibm.com), disk ID 4 (NSD name node2_sdb, node name is gpfstest2.cn.ibm.com), and disk ID
6 (NSD name node6_sdc, node name is gpfstest6.cn.ibm.com). Check each block of the file to see if the
blocks are located correctly. If the blocks are not located correctly, fix the data locality.
mmgetlocation:
Synopsis
mmgetlocation {[-f filename] | [-d directory]}
[-r {1|2|3|all}]
[-b] [-L] [-l] [-Y] [--lessDetails]
[-D [diskname,diskname,...]]
[-N [nodename,nodename,...]]
Parameters
-f filename
Specifies the file whose block location you want to query. It should be absolute file path. For one file,
the system displays the block/chunk information and the file block summary information.
-d directory
Specifies the directory whose block location you want to query. All files under <directory> will be
checked and summarized together. <directory> must be the absolute directory path. The system
displays one block summary for each file and one directory summary with the block information. The
options -f and -d are exclusive.
Notes
1. Only tested over Linux.
2. Does not recursively process the subdirectories if option -d is specified.
3. For FPO, if both -D and -N are specified, the -N option must be with only one node because no two
NSDs in FPO belong to the same node.
4. For mmgetlocation -Y, the system displays the output in the following formats:
a. mmgetlocation:fileSummary:filepath:blockSize:metadataBlockSize:dataReplica:metadataReplica:
storagePoolName:allowWriteAffinity:writeAffinityDepth:blockGroupFactor:(-Y -L specified)
b. mmgetlocation:fileDataInfor:chunkIndex:offset:NSDName:NSDServer:diskID:failureGroup:
reserved:NSDName:NSDServer:diskID:failureGroup:reserved:NSDName:NSDServer:diskID:failureGroup:
reserved: if there are 2 or 3 replicas, repeat "nsdName:nsdServer:diskID:failureGroup:reserved:"
if the option "-L" is not specified, the value of "diskID" and "failureGroup" will be blank
.
c. mmgetlocation:fileDataSummary:path:replicaIndex:nsdServer:nsdName:blocks:(-l specified)
mmgetlocation:fileDataSummary:path:replicaIndex:nsdServer:blocks:(-l not specified) if there are
more than 1 NSD for replica #, each one will be output as one line if the value of "nsdName" in
one line is "all", that means, the option "-l" is not given.
d. mmgetlocation:dirSummary:path:replicaIndex:nsdServer:nsdName:blocks:(-l specified)
Note: If the value of nsdName in one line is all, the option -l is not given. So, for the option -f, the
output is:
a
b
c
Examples
l /usr/lpp/mmfs/samples/fpo/mmgetlocation -f /sncfs/file1G
From the summary at the end of the output, you can know, for the file /sncfs/file1G,
8 chunks of the 1st replica are located on the node c3m3n03.
The 8 chunks of the 2nd replica are located on the node c3m3n04 and c3m3n02,
The 8 chunks of the 3nd replica are located on the node c3m3n04 and c3m3n02.
l /usr/lpp/mmfs/samples/fpo/mmgetlocation -d /sncfs/t2 -L -Y
mmgetlocation:fileDataSummary:path:replicaIndex:nsdServer:blocks:
mmgetlocation:fileDataSummary:/sncfs/t2/_partition.lst:1:c3m3n04:1:
mmgetlocation:fileDataSummary:/sncfs/t2/_partition.lst:2::1:
mmgetlocation:fileDataSummary:/sncfs/t2/_partition.lst:3::1:
mmgetlocation:fileDataSummary:path:replicaIndex:nsdServer:blocks:
mmgetlocation:fileDataSummary:path:replicaIndex:nsdServer:blocks:
mmgetlocation:fileDataSummary:/sncfs/t2/part-r-00000:1:c3m3n04:2:
mmgetlocation:fileDataSummary:path:replicaIndex:nsdServer:blocks:
mmgetlocation:fileDataSummary:/sncfs/t2/part-r-00002:1:c3m3n04:2:
mmgetlocation:fileDataSummary:path:replicaIndex:nsdServer:blocks:
mmgetlocation:fileDataSummary:/sncfs/t2/part-r-00001:1:c3m3n02:2:
mmgetlocation:dirDataSummary:path:replicaIndex:nsdServer:blocks:
mmgetlocation:dirDataSummary:/sncfs/t2/:1:c3m3n04:5:
mmgetlocation:dirDataSummary:/sncfs/t2/:1:c3m3n02:2:
l /usr/lpp/mmfs/samples/fpo/mmgetlocation -f /sncfs/file1G -Y -L
mmgetlocation:fileSummary:filepath:blockSize:metadataBlockSize:dataReplica:metadataReplica:
storagePoolName:allowWriteAffinity:writeAffinityDepth:blockGroupFactor:
mmgetlocation:fileSummary:/sncfs/file1G:1048576::3:3:fpodata:yes:1:128:
mmgetlocation:fileDataInfor:chunkIndex:offset:NSDName:NSDServer:diskID:failureGroup:
reserved:NSDName:NSDServer:diskID:failureGroup:reserved:NSDName:NSDServer:diskID:failureGroup:reserved:
mmgetlocation:fileDataInfor:0:0):data_c3m3n03_sdd:c3m3n03:5:3,0,0::data_c3m3n02_sdc:c3m3n02:3:1,0,
0::data_c3m3n04_sdc:c3m3n04:9:2,0,0::
mmgetlocation:fileDataInfor:1:134217728):data_c3m3n03_sdd:c3m3n03:5:3,0,0::data_c3m3n04_sdc:c3m3n04:9:2,0,
0::data_c3m3n02_sdc:c3m3n02:3:1,0,0::
mmgetlocation:fileDataInfor:2:268435456):data_c3m3n03_sdd:c3m3n03:5:3,0,0::data_c3m3n02_sdc:c3m3n02:3:1,0,
0::data_c3m3n04_sdc:c3m3n04:9:2,0,0::
mmgetlocation:fileDataInfor:3:402653184):data_c3m3n03_sdd:c3m3n03:5:3,0,0::data_c3m3n04_sdc:c3m3n04:9:2,0,
0::data_c3m3n02_sdc:c3m3n02:3:1,0,0::
mmgetlocation:fileDataInfor:4:536870912):data_c3m3n03_sdd:c3m3n03:5:3,0,0::data_c3m3n02_sdc:c3m3n02:3:1,0,
0::data_c3m3n04_sdc:c3m3n04:9:2,0,0::
mmgetlocation:fileDataInfor:5:671088640):data_c3m3n03_sdd:c3m3n03:5:3,0,0::data_c3m3n04_sdc:c3m3n04:9:2,0,
0::data_c3m3n02_sdc:c3m3n02:3:1,0,0::
mmgetlocation:fileDataInfor:6:805306368):data_c3m3n03_sdd:c3m3n03:5:3,0,0::data_c3m3n02_sdc:c3m3n02:3:1,0,
0::data_c3m3n04_sdc:c3m3n04:9:2,0,0::
mmgetlocation:fileDataInfor:7:939524096):data_c3m3n03_sdd:c3m3n03:5:3,0,0::data_c3m3n04_sdc:c3m3n04:9:2,0,
0::data_c3m3n02_sdc:c3m3n02:3:1,0,0::
mmgetlocation:fileDataSummary:path:replicaIndex:nsdServer:blocks:
mmgetlocation:fileDataSummary:/sncfs/file1G:1:c3m3n03:8:
mmgetlocation:fileDataSummary:/sncfs/file1G:2:c3m3n04:4:
mmgetlocation:fileDataSummary:/sncfs/file1G:2:c3m3n02:4:
mmgetlocation:fileDataSummary:/sncfs/file1G:3:c3m3n04:4:
mmgetlocation:fileDataSummary:/sncfs/file1G:3:c3m3n02:4:
l For IBM Spectrum Scale earlier than 4.2.2.0 perform the following steps to get block location of files.
1. cd /usr/lpp/mmfs/samples/fpo/
g++ -g -DGPFS_SNC_FILEMAP -o tsGetDataBlk -I/usr/lpp/mmfs/include/ tsGetDataBlk.C -L/usr/lpp/mmfs/lib/ -lgpfs
2. ./tsGetDataBlk <filename> -s 0 -f <data-pool-block-size * blockGroupFactor> -r 3
3. Check the output of the program tsGetDataBlk:
[root@gpfstest2 sncfs]# /usr/lpp/mmfs/samples/fpo/tsGetDataBlk /sncfs/test -r 3
File length: 1073741824, Block Size: 2097152
Parameters: startoffset:0, skipfactor: META_BLOCK, length: 1073741824, replicas 3
Parameters
Device
The device name of the file system to which the disks belong. File system names need not be
fully-qualified. fs0 is as acceptable as /dev/fs0. This must be the first parameter.
{-s {[Fileset]: Snapshot | srcDir | filePath}
Snapshot is the snapshot name. If :Snapshot is specified, the global snapshot is named Snapshot from
Device. If there are more than 1 snapshots existing from :Snapshot or Snapshot, it will fail. Also, if it is
fileset snapshot, ensure that the fileset is linked. srcDir is the source directory that is copied. The
directory must exist in device. If the directory is the JunctionPath of one fileset, the fileset must be
linked before running the script. filePath is the file path that will be copied.
Note: The copy tasks are distributed at the file level (one file per copy task). The option -l and -b
are exclusive. If either the option -l or -b is not specified, the option -l is true as default.
-f If the to-be-copied file exists under targetDir, it will be overwritten if the option -f is specified. Or,
the file will be skipped.
-r When the option -s {srcDir} is specified, option -r will copy the files in recursive mode. For -s
{[Fileset]:Snapshot}, option -r is always true.
-v Displays verbose information.
-a All nodes in the cluster are involved in copying tasks.
-N {Node[,Node...] | NodeFile | NodeClass}
Directs a set of nodes to be involved in copying tasks. -a is the default if option -N is not specified.
Notes
1. If your file system mount point has special character, excluding +,-,_, it is not supported by this script.
2. If the file path contains special character, such as a blank character or a line break character, the file is
not copied with warning.
3. When option -a or -N is specified, the file system for the -t targetDir must be mounted if it is from
external NFS or another IBM Spectrum Scale file system.
4. Only copies the regular data file, does not copy link, special files.
5. If one file is not copied, the file is displayed and not copied again in the same invocation.
6. You must specify option -s with snapshot. For directory, the file list is not rescanned to detect any
newly created files or subdirectories.
The IBM Spectrum Scale FPO provides interface for you to control all first replica of the blocks, all
second replicas of the blocks, and all third replicas of the blocks in specific nodes. For example, you can
have the first replica of all blocks located in a specific node so that the applications running over the
node can read all data from local disks.
Note: The IBM Spectrum Scale FPO does not support the control of the location of only one or part of
blocks. For example, you cannot control the location of block 1 or block 2 without changing the location
of block 3.
This topic lists the steps to control the first replica of all blocks.
1. Check whether the file is configured with WADFG.
If you want to control the location of the first replica, second replica, and the third replica, set the
WADFG attributes of the files via mmchattr. If you are using IBM Spectrum Scale 4.1.1.0 or earlier,
perform these steps to restore the data locality.
1. Decide the location for data replica.
2. Run mmchattr --write-affinity-failure-group to set/update the new WADFG of the file Step.
In IBM Spectrum Scale 4.2.2.0 and later, mmrestripefile -l is optimized to reduce unnecessary replica
data movement. For example, the original WADFG is (1;2;3). If it is changed into (4;2;3),
mmrestripefile -l moves only the first replica of all blocks. However, if it is changed into (4),
mmrestripefile -l might move the second and third replica. Therefore, changing the original WADFG
from (1;2;3) into (4;2;3) is better than changing it into (4).
3. If you are using IBM Spectrum Scale 4.1.1.0 or later, skip this step. Run mmrestripefile –l filename
or mmrestripefile –b filename.
Disk Replacement
This topic describes how to replace a disk.
In a production cluster, you can replace physically broken disks with new disks or replace the failed
disks with new disks.
v If you have non functional disks from two failure groups for replica 3, restripe the file system to
protect the data to avoid data loss from a third non functional disk from the third failure group.
v Replacing the disks is time-consuming because the whole inode space must be scanned and the IO
traffic in the cluster is triggered. Therefore, schedule the disk replacement when the cluster is not busy.
The mmrpldisk command can be used to replace one disk in file system with a new disk and it can
handle one disk in one invocation. If you want to replace only one disk, see mmrpldisk command.
Note: In FPO, sometimes mmrpldisk command does not migrate all data from the to-be-replaced disk to
the newly added disk. This bug impacts IBM Spectrum Scale Release 3.5 and later. See the following
example:
[root@c8f2n03 ~]# mmlsdisk sncfs –L
disk driver sector failure holds holds avail- storage
name type size group metadata data status ability disk id pool remarks
------------ -------- ------ ----------- -------- ----- ------------- ------- ------- --------- --------
n03_0 nsd 512 1 Yes Yes ready up 1 system
n03_1 nsd 512 1 Yes Yes ready up 2 system desc
n04_0 nsd 512 2,0,0 Yes Yes ready up 3 system desc
n04_1 nsd 512 2,0,0 Yes Yes ready up 4 system
n05_1 nsd 512 4,0,0 No Yes ready up 5 system desc
Number of quorum disks: 3
Read quorum value: 2
Write quorum value: 2
If you want to replace more than one disk, run the mmrpldisk command multiple times. The PIT job is
triggered to scan the whole inode space to migrate the data to disks that are going to be replaced. The IO
traffic is triggered and is time-consuming if you have to run the mmrpldisk command multiple times. To
speed the replacement process, see the following sub sections to replace more than one disk in the file
system.
If you want to replace more than one disk used in file system, and if you have a lot of files or data in the
file system, it will take long time to do this if you are using the mmrpldisk command for each disk.
If you have additional idle disk slots, you can plug new disks into these idle slots and run mmcrnsd to
create new NSD disks against the disks that are to be added, run mmadddisk (without the option -r) to
add the new disks into the file system and then mmdeldisk the disks that are to be replaced by using
mmdeldisk.
Note: If you place new disks in the same failure group of the disks that are to be replaced, the above
operations will maintain the data locality for the data from disks that are to be replaced. IBM Spectrum
Scale keeps the data in the original failure group.
If you do not have additional idle disk slots, run the mmdeldisk command on the disks that are to be
replaced, run mmcrnsd to create the NSD disks and run mmadddisk to add the NSD disks to the file system.
You might have to run mmrestripefs -b to balance the file system but this breaks the data locality.
if the broken disks have been restriped they become emptied or non functional. Run the mmdeldisk
command directly. Pull out the broken disks, pull in the new disks, run mmcrnsd, and then run mmadddisk
to add them into the file system.
If the broken disks have not been restriped (then, it might be ready/down or ready/up), then take the
following steps:
1. Disable auto recovery temporarily (refer the section 2.1, step 2)
2. Pull out the broken disks directly. You could run mmlsnsd -X to check what these pulled-out disks will
be like: node7_sdn C0A80A0756FBAA89 - - gpfstest7.cn.ibm.com (not found) server node
3. Pull in the new disks.
4. mmcrnsd for the new disks (take new NSD name)
5. mmadddisk <fs-name> -F <new-nsd-file from step4>
Auto recovery
The FPO-enabled/disabled storage pool over internal disks are subject to frequent node and disk failures
because of the commodity hardware used in IBM Spectrum Scale clusters.
IBM Spectrum Scale auto recovery feature is designed to handle random but routine node and disk
failures without requiring manual intervention. However, auto recovery cannot cover all catastrophic
outages involving large number of nodes and disks at once. Administrator assessment of the situation
and judgment is required to determine the cluster recovery action.
Note:
IBM Spectrum Scale recovery actions are enabled by setting the restripeOnDiskFailure configuration
option to yes. When this option is enabled, auto recovery leverages the IBM Spectrum Scale event
callback mechanism to trigger necessary actions to perform recovery actions. Specifically, the following
system callbacks are installed when restripeOnDiskFailure=yes.
v event = diskFailure action: /usr/lpp/mmfs/bin/mmcommon recoverFailedDisk %fsName %diskName
v event = nodeJoin action: /usr/lpp/mmfs/bin/mmcommon restartDownDisks %myNode %clusterManager
%eventNode
v event = nodeLeave action: /usr/lpp/mmfs/bin/mmcommon stopFailedDisk %myNode %clusterManager
%eventNode
diskFailure Event
This event is triggered when a disk I/O operation fails. Upon I/O failure, IBM Spectrum Scale marks the
disk from read/up to ready/down. This I/O failure can also be caused by a node, because all disks
connected by the node become unavailable, or a disk failure.
Recovery process
1. Perform simple checks, such as fpo pool and replication >1.
Note: If the file system version is 5.0.2 and later, the suspended disks from auto recovery are resumed
when the node with suspended or to be emptied disks joins the cluster again. If the file system
version is earlier than 5.0.2, cluster administrator has to manually run mmchdisk fs-name resume -a to
resume the disks.
nodeJoin Event
This event is triggered when a node joins the cluster after a node reboot or rejoined after losing
membership to the cluster or getting started after an extended outage. Scope of the recovery is all file
systems to which the node disks might belong to. In most case, the disk state can be ready/up if no I/O
operation has been performed or ready/down. However, based on the prior events, the state could vary
to suspended/down or unrecovered/recovering.
Recovery process
1. Perform simple checks on the disks assigned to the file systems.
2. Check if a tschdisk start is already running from a prior event. Kill the process to include disks from
the current nodes.
3. Start all disks on all nodes by running: tschdisk start -a to optimize recovery time. This command
requires all nodes in the cluster to be functioning in order to access all the disks in the file system.
4. Start All down disks on all Active nodes by running: tschdisk start -F<file containing disk
list>.
5. If the file system version is 5.0.2 and later, auto recovery will run mmchdisk fs-name resume -d
<suspended-disk-by-auto-recovery>. If the file system version is earlier than 5.0.2, this command will
not be executed.
6. After successful completion, for file system version 5.0.2 and later, all disks must be in the ready/up
state. For file system version earlier than 5.0.2, all disks must be in the suspended/up state.
For file system version 5.0.2 and later, if the administrator runs mmchdisk fs-name suspend -d <disks>
and these disks do not resume by auto recovery, the administrator needs to resume these disks
manually.
If a new diskFailure event is triggered while tschdisk start is in progress, the disks will not be restored
to the Up state until the node joins the cluster and triggers a nodeJoin event.
nodeLeave Event
This event is triggered when a node leaves the cluster, is expelled, or shut down.
The processing of this event is similar to the diskFailure event, except that disks may not already be
marked as Down when this event is received. Note that a diskFailure event can still be generated based
on an I/O activity in the cluster. If it is generated, no action will be taken by the diskFailure event
handler if the owning node is also down, thereby allowing the nodeLeave event to control the recovery.
Recovery process
1. Wait for the specified duration to give the failed nodes a chance to recover.
2. Check the Down nodes count, Down disks count and available data and metadata FG count to check
against the maximum limit.
3. Build a list of disk to act upon, ignoring suspended, empty, to be emptied.
4. Run tsrestripefs to restore replica count to the stated values.
5. After successful completion, disks can be in the suspended/down state or no action may be taken if
the nodeJoin event is triggered within the recoveryWaitPeriod.
To get QoS support for autorecovery, you must enable QoS and assign IOPS to the maintenance and
other classes of the storage pools that you want autorecovery to restore. For more information, see
“Setting the Quality of Service for I/O operations (QoS)” on page 134.
In IBM Spectrum Scale v4.2.1.x and v4.2.2.x, the autorecovery process always runs in the QoS
maintenance class. If you assign a smaller share of IOPS to the maintenance class, this setting ensures
that autorecovery does not compete with normal processes for I/O operations.
In IBM Spectrum Scale v4.2.3 and later, the autorecovery process runs in the QoS maintenance class only
if one replica is lost. If more than one replica is lost, the autorecovery process runs in the QoS other class
so that it completes faster.
Restrictions
An FPO environment includes restrictions.
There might be additional limitations and restrictions. For the latest support information, see the IBM
Spectrum Scale FAQ in IBM Knowledge Center (www.ibm.com/support/knowledgecenter/STXKQY/
gpfsclustersfaq.html).
Note: GPFS encryption is only available with IBM Spectrum Scale Advanced Edition or IBM Spectrum
Scale Data Management Edition. The file system must be at GPFS V4.1 or later. Encryption is supported
in:
v Multicluster environments (provided that the remote nodes have their own /var/mmfs/etc/RKM.conf
files and access to the remote key management servers. For more information, see “Encryption keys.”)
v FPO environments
Secure storage uses encryption to make data unreadable to anyone who does not possess the necessary
encryption keys. The data is encrypted while “at rest” (on disk) and is decrypted on the way to the
reader. Only data, not metadata, is encrypted.
GPFS encryption can protect against attacks targeting the disks (for example, theft or acquisition of
improperly discarded disks) as well as attacks performed by unprivileged users of a GPFS node in a
multi-tenant cluster (that is, a cluster that stores data belonging to multiple administrative entities called
tenants). However, it cannot protect against deliberate malicious acts by a cluster administrator.
Secure data deletion leverages encryption and key management to guarantee erasure of files beyond the
physical and logical limitations of normal deletion operations. If data is encrypted, and the master key (or
keys) required to decrypt it have been deleted from the key server, that data is effectively no longer
retrievable. See “Encryption keys.”
Important: Encryption should not be viewed as a substitute for using file permissions to control user
access.
Encryption keys
GPFS uses the following types of encryption keys:
master encryption key (MEK)
An MEK is used to encrypt file encryption keys.
MEKs are stored in remote key management (RKM) servers and are cached by GPFS components.
GPFS receives information about the RKM servers in a separate /var/mmfs/etc/RKM.conf
configuration file. Encryption rules present in the encryption policy define which MEKs should
be used, and the /var/mmfs/etc/RKM.conf file provides a means of accessing those keys. The
/var/mmfs/etc/RKM.conf also specifies how to access RKMs containing MEKs used to encrypt
files created under previous encryption policies.
An MEK is identified with a unique Keyname that combines the name of the key and the RKM
server on which it resides. See “Encryption policy rules” on page 566 for Keyname format.
file encryption key (FEK)
An FEK is used to encrypt sectors of an individual file. It is a unique key that is randomly
generated when the file is created. For protection, it is encrypted (or “wrapped”) with one or
more MEKs and stored in the gpfs.Encryption extended attribute of the file.
A wrapped FEK cannot be decoded without access to the MEK (or MEKs) used to wrap it.
Therefore, a wrapped FEK is useless to an attacker and does not require any special handling at
Note: If an encryption policy specifies that an FEK be wrapped multiple times, only one of the
wrapped-FEK instances needs to be unwrapped for the file to be accessible.
Encryption policies
IBM Spectrum Scale uses encryption policies to manage aspects of how file encryption is to be
implemented, including the following:
v Which files are to be encrypted
v Which algorithm is to be used for the encryption
v Which MEK (or MEKs) are to be used to wrap the FEK of a file
Encryption policies are configured using the mmchpolicy command and are applied at file creation time.
When a file is created, encryption rules are traversed in order until one of the following occurs:
v The last rule is reached.
v The maximum number of SET ENCRYPTION rules that can be matched is reached. Currently the
maximum is eight rules.
v An ENCRYPTION EXCLUDE rule is matched.
If the file matches at least one SET ENCRYPTION rule, an FEK is generated and used to encrypt its
contents. The FEK is wrapped once for each policy that it matches, resulting in one or more versions of
the encrypted FEK being stored in the gpfs.Encryption extended attribute of the file.
Notes:
1. When an encryption policy is changed, the changes apply only to the encryption of subsequently
created files.
2. Encryption policies are defined on a per-file-system basis by a system administrator. After the
encryption policies are put in place, they can result in files in different filesets or with different names
being encrypted differently.
where:
ALGO EncParamString
specifies the encryption parameter string, which defines the following:
v encryption algorithm
v key length
v mode of operation
v key derivation function
COMBINE CombineParamString
specifies a string that defines the mode to be used to combine MEKs specified by the KEY
statement.
The following combine parameter string values are valid:
Table 46. Valid combine parameter string values
Value Description
XORHMACSHA512 Combine MEKs with a round of XOR followed by a
round of HMAC with SHA-512.
XOR Combine MEKs with a round of XOR.
WRAP WrapParamString
specifies a string that defines the encryption algorithm and the wrapping mode to be used to
wrap the FEK.
The following wrapping parameter string values are valid:
Table 47. Valid wrapping parameter string values.
Value Description
AES:KWRAP Use AES key wrap to wrap the FEK.
AES:CBCIV Use AES in CBC-IV mode to wrap the FEK.
where
KeyId
An internal identifier that uniquely identifies the key inside the RKM. Valid characters for
KeyId are the following: 'A' through 'Z'; 'a' through 'z'; '0' through '9'; and '-' (hyphen). The
minimum length of KeyId is one character; the maximum length is 42 characters.
Notes:
1. The maximum number of keys you can specify with the ENCRYPTION IS rule is eight.
2. The number of keys that can be used to encrypt a single file is permanently limited by the
inode size of the file system.
3. You cannot specify the same key more than once in a given ENCRYPTION IS rule. Also,
do not specify keys with identical values in an ENCRYPTION IS rule. Specifying the
same key or identically-valued keys could result in a security breach for your data.
SET ENCRYPTION
The SET ENCRYPTION rule is similar to the SET POOL rule. If more than one such rule is
present, all SET ENCRYPTION rules are considered and the FEK is wrapped once for each of the
rules that apply (up to the maximum of eight). As mentioned in “Encryption keys” on page 565,
if an FEK is wrapped multiple times, only one of the wrapped-FEK instances needs to be
unwrapped for the file to be accessed.
If no SET ENCRYPTION rule is applicable when a file is created, the file is not encrypted.
The syntax of the SET ENCRYPTION rule is:
RULE ’RuleName’ SET ENCRYPTION ’EncryptionSpecificationName’[, ’EncryptionSpecificationName’,...]
[FOR FILESET (’FilesetName’[,’FilesetName’]...)]
[WHERE SqlExpression]
where:
EncryptionSpecificationName
is the name of a specification defined by an ENCRYPTION IS rule.
To stop traversing policy rules at a certain point and encrypt using only those rules that have
matched up to that point, use the SET ENCRYPTION EXCLUDE rule:
RULE [’RuleName’] SET ENCRYPTION EXCLUDE
[FOR FILESET (’FilesetName’[,’FilesetName’]...)]
[WHERE SqlExpression]
| In addition to the values that are shown in Table 45 on page 567, you can also specify either of two
| default values following the ALGO keyword. These two values have the same effect as to policy, but the
| second value provides faster runtime performance in certain environments:
| v DEFAULTNISTSP800131A
| v DEFAULTNISTSP800131AFAST
| Note: You cannot use COMBINE or WRAP in a policy rule that contains a default ALGO value, because the
| default ALGO value generates its own default values for COMBINE and WRAP.
Note:
Rewrapping policies
Rewrapping policies are policies that change how a set of FEKs is encrypted by changing the set of MEKs
that wrap the FEKs. Rewrapping applies only to files that are already encrypted, and the rewrapping
operation acts only on the gpfs.Encryption EA of the files. Rewrapping is done by using the
mmapplypolicy command to apply a set of policy rules containing one or more CHANGE ENCRYPTION KEYS
rules. These rules have the form:
where:
v Keyname_1 is the unique identifier of the MEK to be replaced. (See “Encryption policy rules” on page
566 for Keyname format.)
v Keyname_2 is the unique identifier of the new MEK, which replaces the old MEK identified by
Keyname_1.
v The FOR FILESET and WHERE clauses narrow down the set of affected files.
Both Keyname_1 and Keyname_2 are listed, and only the files that currently use Keyname_1 have their FEKs
rewrapped with Keyname_2. Files that do not currently use Keyname_1 are not affected by the operation.
Notes:
1. Only the first matching CHANGE ENCRYPTION KEYS rule is applied to each file. The rule rewraps
each wrapped version of the FEK that was encrypted with the MEK in the CHANGE ENCRYPTION
KEYS rule.
2. The same MEK cannot be used more than once in a particular wrapping of the FEK.
| Tip: The mmapplypolicy command always begins by scanning all of the files in the affected file system or
| fileset to discover files that meet the criteria of the policy rule. In the preceding example, the criterion is
| whether the file is encrypted with a FEK that is wrapped with the MEK Keyname_1. If your file system or
| fileset is very large, you might want to delay running mmapplypolicy until a time when the system is not
| running a heavy load of applications. For more information, see the topic “Phase one: Selecting candidate
| files” on page 391.
Terms defined
The following table lists the versions of IBM Spectrum Scale that support encryption and the encryption
setup methods:
Table 48. Required version of IBM Spectrum Scale
IBM software Version Encryption setup
IBM Spectrum Scale V4.1 or later v Regular setup
v Advanced Edition v Regular setup with certificate chain
v Data Management Edition
V4.2.1 or later Simplified setup
The next table shows the RKM server software that IBM Spectrum Scale supports.
Table 49. Remote Key Management servers
RKM server Version Type of encryption setup
IBM Security Key Lifecycle Manager V2.5.0.1 or later v Regular setup
(SKLM)
v Regular setup with certificate chain
IBM Security Key Lifecycle Manager V2.5.0.4 or later Simplified setup
(SKLM)
Vormetric Data Security Manager V5.2.3 or later Regular setup
(DSM)
where <Device> is the name of the file system. However, if your file system was migrated from an
earlier level, enter the following command to add support for fast extended attributes:
mmmigrate <Device> --fastea
where <Device> is the name of the file system. For more information, see the topic Completing the
migration to a new level of IBM Spectrum Scale in the IBM Spectrum Scale: Concepts, Planning, and
Installation Guide.
The preparation of the RKM server depends on the RKM server product that you select and the
encryption setup that you plan to follow. For more information, see the help topic in the following list
that describes the setup of your RKM server:
v “Simplified setup: Using SKLM with a self-signed certificate” on page 577
v “Regular setup: Using SKLM with a self-signed certificate” on page 606
v “Configuring encryption with the Vormetric DSM key server” on page 626
An RKM back end defines a connection between a local key client, a remote key tenant, and an RKM
server. Each RKM back end is described in an RKM stanza in an RKM.conf file on each node that is
configured for encryption.
By controlling the contents of the RKM.conf file, the cluster administrator can control which client nodes
have access to master encryption keys (MEKs). For example, the same RKM server can be given two
Because the master encryption keys (MEK) are cached in memory, some short-term outages while
attempting to access a key server might not cause issues. However, failure to retrieve the keys might
result in errors while creating, opening, reading, or writing files. Although the keys are cached, they are
periodically retrieved from the key server to ensure their validity.
To ensure that MEKs are always available, it is a good practice to set up multiple key servers in a
high-availability configuration. See the subtopic “Adding backup RKM servers in a high-availability
configuration” on page 575.
Note: If you are using the simplified setup, then the mmkeyserv command manages its own RKM.conf
file and updates it automatically. This management includes adding any backup servers for High
Availability and other key retrieval properties. The RKM.conf file that the mmkeyserv command manages
is in the /var/mmfs/ssl/keyServ directory.
The management of the RKM.conf file and its stanzas depends on the setup:
v In the simplified setup, the mmkeyserv command manages its own RKM.conf file and updates it
automatically. This management includes adding any backup servers for High Availability and other
key retrieval properties.
v In the regular setup and the Vormetric setup, you must manage the RKM.conf file and its contents.
The location of the RKM.conf file also depends on the setup:
Table 50. The RKM.conf file
Setup Location of the RKM.conf file
Simplified setup /var/mmfs/ssl/keyServ/RKM.conf
Regular setup /var/mmfs/etc/RKM.conf
Vormetric DSM setup /var/mmfs/etc/RKM.conf
The length of the RKM.conf file cannot exceed 1 MiB. No limit is set on the number of RKM stanzas, if the
length limit is not exceeded.
After the file system is configured with encryption policy rules, the file system is considered encrypted.
From that point on, each node that has access to that file system must have an RKM.conf file present.
Otherwise, the file system might not be mounted or might become unmounted.
Each RKM stanza in the RKM.conf file describes a connection between a local key client, a remote tenant,
and an RKM server. The following code block shows the structure of an RKM stanza:
RKM ID {
type = ISKLM
kmipServerUri = tls://host:port
keyStore = PathToKeyStoreFile
passphrase = Password
clientCertLabel = LabelName
tenantName = NameOfTenant
[connectionTimeout = ConnectionTimeout]
[connectionAttempts = ConnectionAttempts]
[retrySleep = RetrySleepUsec]
}
| You can add up to five backup RKM servers to your configuration if necessary to improve the reliability
| or performance of master encryption key retrieval. A backup RKM server is specified by adding a line in
| the following format to the RKM stanza:
| <server_name>=<IP_address:port_number>
| The line must be added either immediately after the line that specifies the primary RKM server or
| immediately after a line that specifies another backup RKM server. In the following example, the
| maximum of five backup RKM servers is specified:
| rkmname3 {
| type = ISKLM
| kmipServerUri = tls://host:port
| kmipServerUri2 = tls://host:port # TLS connection to backup RKM server 1
| kmipServerUri3 = tls://host:port # TLS connection to backup RKM server 2
| kmipServerUri4 = tls://host:port # TLS connection to backup RKM server 3
| kmipServerUri5 = tls://host:port # TLS connection to backup RKM server 4
| kmipServerUri6 = tls://host:port # TLS connection to backup RKM server 5
| keyStore = PathToKeyStoreFile
| passphrase = Password
| clientCertLabel = LabelName
| tenantName = NameOfTenant
| [connectionTimeout = ConnectionTimeout]
| [connectionAttempts = ConnectionAttempts]
| [retrySleep = RetrySleepUsec]
| }
| Note: In the regular setup method you must add each line manually; in the simplified setup lines are
| added automatically in response to mmkeyserv commands. For more information see the following
| subtopics:
| v Regular setup: See the subtopic "Part 3: Configuring the remote key management (RKM) back end"
| in the topic “Regular setup: Using SKLM with a self-signed certificate” on page 606.
| If at least one backup RKM server is configured, then whenever key retrieval from the primary RKM
server fails, IBM Spectrum Scale queries each backup RKM server in the list, in order, until it finds the
MEK. The addition of the URIs for the backup RKM servers is the only change that is required within
IBM Spectrum Scale. All other configuration parameters (certificates, keys, node, and tenant information)
do not need to change, because they are also part of the set of information that is replicated. The
administrator is responsible for creating and maintaining any backups.
Additionally, setting up backup RKM servers can help gain some performance advantage by distributing
MEK retrieval requests across the backup RKM servers in a round-robin fashion. To achieve this result,
the administrator must specify different orderings of the server endpoints on different IBM Spectrum
Scale nodes in the /var/mmfs/etc/RKM.conf file.
For example, if two backup RKM servers are available, such as tls://keysrv.ibm.com:5696 and
tls://keysrv_backup.ibm.com:5696, half of the nodes in the cluster can have the following content in
/var/mmfs/etc/RKM.conf:
...
kmipServerUri = tls://keysrv.ibm.com:5696
kmipServerUri2 = tls://keysrv_backup.ibm.com:5696
...
The files in the client keystore directory include the client keystore file, the public and private key files
for the client, and possibly other files that are described in later topics.
The location of the client keystore directory also depends on the setup:
Table 51. The client keystore directory
Setup Location of the client keystore directory
Simplified setup /var/mmfs/ssl/keyServ
Regular setup /var/mmfs/etc/RKMcerts
Vormetric DSM setup /var/mmfs/etc/RKMcerts
| For information about the regular setup and the simplified setup methods, see the definition of these terms
| in the topic “Preparation for encryption” on page 571.
This topic describes the simplified method for setting up encryption with IBM Security Key Lifecycle
Manager (SKLM) as the key management server and with a self-signed certificate on the KMIP port of
the SKLM server. If your deployment scenario uses a chain of certificates from a certificate authority, see
one of the following topics:
“Simplified setup: Using SKLM with a certificate chain” on page 584
“Regular setup: Using SKLM with a certificate chain” on page 614
The simplified setup with IBM Security Key Lifecycle Manager (SKLM) requires IBM Spectrum Scale
Advanced Edition or IBM Spectrum Scale Data Management Edition V4.2.1 or later and SKLM V2.5.0.4 or
later (including V2.6).
Note: If you are using SKLM v2.7 or later, see the topic “Configuring encryption with SKLM v2.7 or
later” on page 623.
Note: In the simplified setup, the mmkeyserv command sets the permission bits automatically.
Important: The client keystore must be record-locked when the GPFS daemon starts. If the keystore
files are stored on an NFS mount, the encryption initialization process can hang. The cause is a bug
that affects the way NFS handles record locking. If you encounter this problem, upgrade your version
of NFS or store your keystore file on a local file system. If an upgrade is not possible and no local file
system is available, use a RAM drive to store the keystore files.
The setup is greatly simplified by the use of the mmkeyserv command, which can communicate with and
configure the SKLM server from the IBM Spectrum Scale node. The mmkeyserv command automates the
following tasks:
v Creating and configuring the client credentials of the IBM Spectrum Scale node.
v Creating a device group and master encryption keys for the node on SKLM.
v Creating an RKM stanza in the RKM.conf configuration file.
v Retrieving a server certificate from SKLM and storing it in the PKCS#12 keystore of the client.
v Propagating the encryption configuration and credentials to all the nodes in the IBM Spectrum Scale
cluster.
The command returns yes if the cluster complies with FIPS or no if not.
b. On the SKLM server system, open the SKLMConfig.properties file.
Note: The default location of the SKLMConfig.properties file depends on the operating system:
v On AIX, Linux, and similar operating systems the directory is at the following location:
/opt/IBM/WebSphere/AppServer/products/sklm/config/SKLMConfig.properties
v On Microsoft Windows the directory is at the following location:
Drive:\Program Files (x86)\IBM\WebSphere\AppServer\products\sklm\config\
SKLMConfig.properties
| or
| fips=off
| If the line is not present in the file, then add it. Set the value to on to configure SKLM to comply
| with FIPS, or set it to off to configure SKLM not to comply with FIPS.
4. Configure the SKLM server to have the same NIST SP800-131a (NIST) setting as the IBM Spectrum
Scale cluster. Follow these steps:
a. Determine the NIST setting of the cluster by entering the following command on the command
line:
mmlsconfig nistCompliance
The command returns SP800-131A if the cluster complies with NIST or off if not.
b. On the SKLM server system, open the SKLMConfig.properties file. For the location of this file, see
the note in Step 3.
| c. Find the line in the SKLMConfig.properties file that begins with the following phrase:
| TransportListener.ssl.protocols=
| If the line is not present in the file, then add it. To configure SKLM to comply with NIST, set the
| value as it is shown below:
| TransportListener.ssl.protocols=TLSv1.2
| To configure SKLM not to comply with NIST, set the value as it is shown below:
| TransportListener.ssl.protocols=SSL_TLS
d. For all V2.5.0.x versions of SKLM, if you are configuring SKLM to comply with NIST, modify the
following variable to include only cipher suites that are approved by NIST. The following
statement is all on one line, with no space before or after the comma:
TransportListener.ssl.ciphersuites=TLS_RSA_WITH_AES_256_CBC_SHA256,
TLS_RSA_WITH_AES_128_CBC_SHA256
5. Configure IBM WebSphere® Application Server so that it has the same NIST setting as the IBM
Spectrum Scale cluster. See the topic Transitioning WebSphere Application Server to the SP800-131
security standard in the volume WebSphere Application Server Network Deployment in the WebSphere
Application Server online documentation.
v WebSphere Application Server can be configured to run SP800-131 in a transition mode or a
strict mode. The strict mode is recommended.
v When NIST is enabled, make sure that WebSphere Application Server certificate size is at least 2048
bytes and is signed with SHA256withRSA as described in the preceding link.
6. If the cipher suites were set at any time, SKLM 2.6.0.0 has a known issue that causes server
certificates always to be signed with SHA1withRSA. To work around the problem, follow these steps:
a. While the SKLM server is running, in the SKLMConfig.properties file, modify the
requireSHA2Signatures property as follows:
requireSHA2Signatures=true
b. Do not restart the server.
c. Generate a new server certificate and set it to be the one in use.
d. If you restart the server, you must repeat this workaround before you can create a server
certificate that is signed other than with SHA1withRSA.
7. Create a self-signed server certificate:
a. On the system where SKLM is running, open the graphical user interface.
b. Click Configuration > SSL/KMIP.
The following table provides an overview of the configuration process. The steps in the table correspond
to the steps in the procedure that begins immediately after the table.
Table 52. Configuring a node for encryption in the simplified setup
Step number Steps
Step 1 Verify the direct network connection between the IBM Spectrum Scale node and the SKLM
server.
Step 2 Add the SKLM key server to the configuration.
Step 3 Add a tenant to the key server.
Step 4 Create a key client.
Step 5 Register the key client to the tenant.
Step 6 Create a master encryption key in the tenant.
Step 7 Set up an encryption policy in the cluster.
Step 8 Test the encryption policy.
where ServerName is the host name or IP address of the SKLM key server that you want to add.
See the example listing in Figure 17 on page 581. You can also specify the REST port number of
the SKLM key server:
mmkeyserv server add ServerName --port RestPortNumber
The default REST port number is 443 for SKLM v2.7 and later and 9080 for SKLM v2.6 and earlier.
b. Enter the password for the SKLM server when prompted.
c. To view the certificate chain of the SKLM server, enter view when prompted.
d. Verify that the certificates that are displayed have the same contents as the certificates in the chain
that you downloaded from SKLM.
e. Enter yes to trust the certificates or no to reject them. If you trust the certificates, the command
adds the key server object to the configuration. In the following listing, key server keyserver01 is
added:
f. Issue the mmkeyserv server show command to verify that the key server is added. The following
listing shows that keyserver01 is created:
# mmkeyserv server show
keyserver01
Type: ISKLM
Hostname: keyserver01.gpfs.net
User ID: SKLMAdmin
REST port: 9080
Label: 1_keyserver01
NIST: on
FIPS1402: off
Backup Key Servers:
Distribute: yes
Retrieval Timeout: 120
Retrieval Retry: 3
Retrieval Interval: 10000
3. Issue the mmkeyserv tenant add command to add a tenant to the key server. The command creates
the tenant on the SKLM server if it does not exist. A tenant is an entity on the SKLM server that can
contain encryption keys and certificates. SKLM uses the term device group instead of tenant.
a. Issue the following command to add tenant devG1 to key server keyserver01. Enter the password
for the SKLM server when prompted:
# mmkeyserv tenant add devG1 --server keyserver01
Enter password for the key server keyserver01:
b. Issue the mmkeyserv tenant show command to verify that the tenant is added. The following
listing shows that tenant devG1 is added to keyserver01:
For example, the RKM ID for the key server and the tenant in these instructions is keyserver01_devG1.
a. Issue the following command to register key client c1Client1with tenant devG1 under RKM ID
keyserver01_devG1. Enter the requested information when prompted:
# mmkeyserv client register c1Client1 --tenant devG1 --rkm-id keyserver01_devG1
Enter password for the key server:
mmkeyserv: [I] Client currently does not have access to the key. Continue the registration
process ...
mmkeyserv: Successfully accepted client certificate
b. Issue the command mmkeyserv tenant show to verify that the key client is known to the tenant.
The following listing shows that tenant devG1 lists c1Client1 as a registered client:
mmkeyserv tenant show
devG1
Key Server: keyserver01.gpfs.net
Registered Client: c1Client1
c. You can also issue the command mmkeyserv client show to verify that the tenant is known to the
client. The following listing shows that client c1Client1 is registered with tenant devG1:
# mmkeyserv client show
c1Client1
Label: c1Client1
Key Server: keyserver01.gpfs.net
Tenants: devG1
d. To see the contents of the RKM stanza, issue the mmkeyserv rkm show command. In the
following listing, notice that the RKM ID of the stanza is keyserver01_devG1, the string that was
specified in Step 5(a):
In the last line of the policy, the character string within single quotation marks (') is the key name.
A key name is a compound of two parts in the following format:
KeyID:RkmID
where:
KeyID Specifies the UUID of the key that you created in Step 6.
RkmID Specifies the RKM ID that you specified in Step 5(a).
| Note: In line six of the preceding example, the default parameter DEFAULTNISTSP800131AFAST can
| be substituted for the default parameter DEFAULTNISTSP800131A following the ALGO keyword. The
| two parameters have the same effect as to policy, but the second value provides faster runtime
| performance in certain environments. For more information see “Encryption policy rules” on page
| 566.
b. Issue the mmchpolicy command to install the rule.
CAUTION:
Installing a new policy with the mmchpolicy command removes all the statements in the
previous policy. To add statements to an existing policy without deleting the previous contents,
collect all policy statements for the file system into one file. Add the new statements to the file
and install the contents of the file with mmchpolicy.
The policy engine detects the new file, encrypts it, and wraps the file encryption key in a master
encryption key.
b. To verify that the file hw.enc is encrypted, issue the following command to display the encryption
attribute of the file. The output shows that the file is encrypted:
# mmlsattr -n gpfs.Encryption /c1Filesystem1/hw.enc
file name: /c1Filesystem1/hw.enc
gpfs.Encryption: "EAGC????.?????????????? ??????h????????????????? ?u?~?}????????????t??lN??
’k???*?3??C???#?)?KEY-ef07b465-cfa5-4476-9f63-544e4b3cc119?NewGlobal11?"
EncPar ’AES:256:XTS:FEK:HMACSHA512’
type: wrapped FEK WrpPar ’AES:KWRAP’ CmbPar ’XORHMACSHA512’
KEY-d4e83148-e827-4f54-8e5b-5e1b5cc66de1:keyserver01_devG1
This topic describes the simplified method for setting up encryption with IBM Security Key Lifecycle
Manager (SKLM) as the key management server and with a certificate signed by a certificate authority
(CA) on the KMIP port of the SKLM server. If your deployment scenario uses a self-signed server
certificate, see one of the following topics:
“Simplified setup: Using SKLM with a self-signed certificate” on page 577
“Regular setup: Using SKLM with a self-signed certificate” on page 606
The simplified setup with IBM Security Key Lifecycle Manager (SKLM) requires IBM Spectrum Scale
Advanced Edition or IBM Spectrum Scale Data Management Edition V4.1 or later and SKLM V2.5.0.1 or
later (including V2.6).
Note: If you are using SKLM v2.7 or later, see the topic “Configuring encryption with SKLM v2.7 or
later” on page 623.
Note: In the simplified setup, the mmkeyserv command sets the permission bits automatically.
v CAUTION:
It is a good practice to take the following precautions:
– Ensure that the passphrase for the client certificate file is not leaked through other means, such
as the shell history.
– Take appropriate precautions to ensure that the security-sensitive files are not lost or corrupted.
IBM Spectrum Scale does not manage or replicate the files.
v
Important: The client keystore must be record-locked when the GPFS daemon starts. If the keystore
files are stored on an NFS mount, the encryption initialization process can hang. The cause is a bug
that affects the way NFS handles record locking. If you encounter this problem, upgrade your version
of NFS or store your keystore file on a local file system. If an upgrade is not possible and no local file
system is available, use a RAM drive to store the keystore files.
The setup is greatly simplified by the use of the mmkeyserv command, which can communicate with and
configure the SKLM server from the IBM Spectrum Scale node. The mmkeyserv command automates the
following tasks:
v Creating and configuring the client credentials of the IBM Spectrum Scale node.
v Creating a device group and master encryption keys for the node on SKLM.
v Creating an RKM stanza in the RKM.conf configuration file.
v Retrieving a server certificate from SKLM and storing it in the PKCS#12 keystore of the client.
v Propagating the encryption configuration and credentials to all the nodes in the IBM Spectrum Scale
cluster.
The command returns yes if the cluster complies with FIPS or no if not.
b. On the SKLM server system, open the SKLMConfig.properties file.
Note: The default location of the SKLMConfig.properties file depends on the operating system:
v On AIX, Linux, and similar operating systems the directory is at the following location:
/opt/IBM/WebSphere/AppServer/products/sklm/config/SKLMConfig.properties
v On Microsoft Windows the directory is at the following location:
Drive:\Program Files (x86)\IBM\WebSphere\AppServer\products\sklm\config\
SKLMConfig.properties
c. Add or remove the following line from the SKLMConfig.properties file. Add the line to configure
SKLM to comply with FIPS, or remove it to have SKLM not comply with FIPS.
fips=on
4. Configure the SKLM server to have the same NIST SP800-131a (NIST) setting as the IBM Spectrum
Scale cluster. Follow these steps:
a. Determine the NIST setting of the cluster by entering the following command on the command
line:
mmlsconfig nistCompliance
The command returns SP800-131A if the cluster complies with NIST or off if not.
b. On the SKLM server system, open the SKLMConfig.properties file. For the location of this file, see
the note in Step 3.
c. Add the following line to configure SKLM to comply with NIST or remove it to configure SKLM
not to comply with NIST:
TransportListener.ssl.protocols=TLSv1.2
d. For all V2.5.0.x versions of SKLM, if you are configuring SKLM to comply with NIST, modify the
following variable to include only cipher suites that are approved by NIST. The following
statement is all on one line, with no space before or after the comma:
TransportListener.ssl.ciphersuites=TLS_RSA_WITH_AES_256_CBC_SHA256,
TLS_RSA_WITH_AES_128_CBC_SHA256
5. Configure IBM WebSphere Application Server so that it has the same NIST setting as the IBM
Spectrum Scale cluster. See the topic Transitioning WebSphere Application Server to the SP800-131
security standard in the volume WebSphere Application Server Network Deployment in the WebSphere
Application Server online documentation.
v WebSphere Application Server can be configured to run SP800-131 in a transition mode or a
strict mode. The strict mode is recommended.
v When NIST is enabled, make sure that WebSphere Application Server certificate size is at least 2048
bytes and is signed with SHA256withRSA as described in the preceding link.
6. If the cipher suites were set at any time, SKLM 2.6.0.0 has a known issue that causes server
certificates always to be signed with SHA1withRSA. To work around the problem, follow these steps:
a. While the SKLM server is running, in the SKLMConfig.properties file, modify the
requireSHA2Signatures property as follows:
requireSHA2Signatures=true
b. Do not restart the server.
c. Generate a new server certificate signing request (CSR) to a third-party certificate authority (CA)
and send it to the CA.
d. When you receive the certificate from the third-party CA, import it into SKLM and set it to be the
certificate in use. For more information, see the next subtopic.
Note: For more information about the steps in this subtopic, see the steps that are described in the SKLM
documentation, in the topic "Scenario: Request for a third-party certificate" at http://www.ibm.com/
support/knowledgecenter/en/SSWPVP_2.7.0/com.ibm.sklm.doc/scenarios/cpt/
cpt_ic_scenar_ca_certusage.html.
1. Create a certificate signing request (CSR) with the SKLM command-line interface:
a. On the SKLM server system, open a command-line window.
b. Change to the WAS_HOME/bindirectory. The location of this directory depends on the operating
system:
v On AIX, Linux, and similar operating systems the directory is at the following location:
/opt/IBM/WebSphere/AppServer/bin
v On Microsoft Windows the directory is at the following location:
drive:\Program Files (x86)\IBM\WebSphere\AppServer\bin
c. Start the command-line interface to SKLM:
v On AIX, Linux, and similar operating systems, enter the following command:
./wsadmin.sh -username SKLMAdmin -password mypwd -lang jython
v On Microsoft Windows, enter the following command:
wsadmin -username SKLMAdmin -password mypwd -lang jython
d. In the SKLM command-line interface, enter the following command on one line:
print AdminTask.tklmCertGenRequest(’[-alias labelCsr -cn server
-validity daysValid -keyStoreName defaultKeyStore -fileName fileName -usage SSLSERVER]’)
where:
-alias labelCsr
Specifies the certificate label of the CSR.
-cn server
Specifies the common name of the server in the certificate.
-validity daysValid
Specifies the validity period of the certificate in days.
-keyStoreName defaultKeyStore
Specifies the keystore name within SKLM where the CSR is stored. Typically, you should
specify defaultKeyStore as the name here.
-fileName fileName
Specifies the fully qualified path of the directory where the CSR is stored on the SKLM
server system, for example /root/sklmServer.csr.
-usage SSLSERVER
Specifies how the generated certificate is used in SKLM.
Important: You must also obtain and copy the intermediate certificate files of the certificate chain of
authority into the same temporary directory. The intermediate certificates might be included with the
generated certificate file, or you might have to obtain the intermediate certificates separately.
Whatever the method, you must have a separate certificate file for the root certificate and for each
intermediate certificate in the chain of authority. You need these certificate files in Part 3.
4. Import the root certificate into the SKLM server with the SKLM graphical user interface:
a. On the Welcome page, in the Action Items section, in the Key Groups and Certificates area, click
You have pending certificates.
b. In the Pending Certificates table, click the certificate that you want to import and click Import.
c. In the File name and location field, type the path and file name of the certificate file and click
Import.
The following table provides a high-level overview of the configuration process. The steps in the table
correspond to the steps in the procedure that begins immediately after the table.
Table 53. Configuring a node for encryption in the simplified setup
Step number Steps
Step 1 Verify the direct network connection between the IBM Spectrum Scale node and the SKLM
server.
Step 2 Add the SKLM key server to the configuration.
Step 3 Add a tenant to the key server.
Step 4 Create a key client.
Step 5 Register the key client to the tenant.
Step 6 Create a master encryption key in the tenant.
Step 7 Set up an encryption policy in the cluster.
Step 8 Test the encryption policy.
1. Verify that the IBM Spectrum Scale node that you are configuring for encryption has a direct network
connection to the system on which the SKLM key server runs.
2. Use the mmkeyserv server add command to add the SKLM key server from Part I to the
configuration:
a. Add the key server to the current configuration:
1) Copy the files in the server certificate chain into a directory on the node that you are
configuring. A good location is the same directory in which the keystore.pwd file is located.
2) Rename each file with the same file name prefix, followed by an increasing integer value that
indicates the order of the certificate in the chain, followed by the suffix .cert. Start the
numbering with 0 for the root certificate.
The following example shows the renamed certificate files for a server certificate chain. The
chain consists of a root CA certificate, one intermediate certificate, and an endpoint certificate.
The files are in directory /root and the file name prefix is sklmChain:
/root/sklmChain0.cert (Root certificate)
where:
ServerName
Is the host name or IP address of the SKLM key server that you want to add.
CertFilesPrefix
Specifies the path and the file name prefix of the files in the certificate chain. For the
files from the example in the previous step, the path and file name prefix is
/root/sklmChain.
For more information, see the topic mmkeyserv command in the IBM Spectrum Scale:
Command and Programming Reference.
b. Issue the following command:
mmkeyserv server add ServerName
where ServerName is the host name or IP address of the SKLM key server that you want to add.
See the example listing in Figure 18 on page 590. You can also specify the REST port number of
the SKLM key server:
mmkeyserv server add ServerName --port RestPortNumber
The default REST port number is 443 for SKLM v2.7 and later and 9080 for SKLM v2.6 and earlier.
c. Enter the password for the SKLM server when prompted.
d. To view the certificate chain of the SKLM server, enter view when prompted.
e. Verify that the certificates that are displayed have the same contents as the certificates in the chain
that you downloaded from SKLM.
f. Enter yes to trust the certificates or no to reject them. If you trust the certificates, the command
adds the key server object to the configuration. In the following listing, key server keyserver01 is
added:
g. Issue the mmkeyserv server show command to verify that the key server is added. The following
listing shows that keyserver01 is created:
# mmkeyserv server show
keyserver01
Type: ISKLM
Hostname: keyserver01.gpfs.net
User ID: SKLMAdmin
REST port: 9080
Label: 1_keyserver01
NIST: on
FIPS1402: off
Backup Key Servers:
Distribute: yes
Retrieval Timeout: 120
Retrieval Retry: 3
Retrieval Interval: 10000
3. Issue the mmkeyserv tenant add command to add a tenant to the key server. The command creates
the tenant on the SKLM server if it does not exist. A tenant is an entity on the SKLM server that can
contain encryption keys and certificates. SKLM uses the term device group instead of tenant.
a. Issue the following command to add tenant devG1 to key server keyserver01. Enter the password
for the SKLM server when prompted:
# mmkeyserv tenant add devG1 --server keyserver01
Enter password for the key server keyserver01:
b. Issue the mmkeyserv tenant show command to verify that the tenant is added. The following
listing shows that tenant devG1 is added to keyserver01:
For example, the RKM ID for the key server and the tenant in these instructions is keyserver01_devG1.
a. Issue the following command to register key client c1Client1with tenant devG1 under RKM ID
keyserver01_devG1. Enter the requested information when prompted:
# mmkeyserv client register c1Client1 --tenant devG1 --rkm-id keyserver01_devG1
Enter password for the key server:
mmkeyserv: [I] Client currently does not have access to the key. Continue the registration
process ...
mmkeyserv: Successfully accepted client certificate
b. Issue the command mmkeyserv tenant show to verify that the key client is known to the tenant.
The following listing shows that tenant devG1 lists c1Client1 as a registered client:
mmkeyserv tenant show
devG1
Key Server: keyserver01.gpfs.net
Registered Client: c1Client1
c. You can also issue the command mmkeyserv client show to verify that the tenant is known to the
client. The following listing shows that client c1Client1 is registered with tenant devG1:
# mmkeyserv client show
c1Client1
Label: c1Client1
Key Server: keyserver01.gpfs.net
Tenants: devG1
d. To see the contents of the RKM stanza, issue the mmkeyserv rkm show command. In the
following listing, notice that the RKM ID of the stanza is keyserver01_devG1, the string that was
specified in Step 5(a):
In the last line of the policy, the character string within single quotation marks (') is the key name.
A key name is a compound of two parts in the following format:
KeyID:RkmID
where:
KeyID Specifies the UUID of the key that you created in Step 6.
RkmID Specifies the RKM ID that you specified in Step 5(a).
b. Issue the mmchpolicy command to install the rule.
CAUTION:
Installing a new policy with the mmchpolicy command removes all the statements in the
previous policy. To add statements to an existing policy without deleting the previous contents,
collect all policy statements for the file system into one file. Add the new statements to the file
and install the contents of the file with the mmchpolicy command.
1) Issue the following command to install the policy rules in file enc.pol for file system
c1FileSystem1:
# mmchpolicy c1FileSystem1 /tmp/enc.pol
Validated policy `enc.pol’: Parsed 3 policy rules.
Policy `enc.pol’ installed and broadcast to all nodes.
The policy engine detects the new file, encrypts it, and wraps the file encryption key in a master
encryption key.
b. To verify that the file hw.enc is encrypted, issue the following command to display the encryption
attribute of the file. The output shows that the file is encrypted:
# mmlsattr -n gpfs.Encryption /c1Filesystem1/hw.enc
file name: /c1Filesystem1/hw.enc
gpfs.Encryption: "EAGC????.?????????????? ??????h????????????????? ?u?~?}????????????t??lN??
’k???*?3??C???#?)?KEY-ef07b465-cfa5-4476-9f63-544e4b3cc119?NewGlobal11?"
EncPar ’AES:256:XTS:FEK:HMACSHA512’
type: wrapped FEK WrpPar ’AES:KWRAP’ CmbPar ’XORHMACSHA512’
KEY-d4e83148-e827-4f54-8e5b-5e1b5cc66de1:keyserver01_devG1
With a single cluster and a single key server, the following rules apply:
v A single key client can register with more than one tenant.
v However, two or more key clients cannot register with the same tenant.
Tenant
devG3
Tenant bl1adv231
devG4
With multiple clusters and a single key server, more than one key client can register with a tenant if the
key clients are in different clusters.
Tenant
devG3
Tenant
bl1adv229
devG4
With a single cluster and multiple key servers, the following rules apply:
v Different key clients in the same cluster can register with different tenants in the same key server.
v But a single key client cannot register with tenants in different key servers.
Tenant
bl1adv230
devG4
This topic shows how to configure a cluster so that it can mount an encrypted file system that is in
another cluster. In the examples in this topic, the encrypted file system is c1FileSystem1 and its cluster is
Cluster1. The cluster that mounts the encrypted file system is Cluster2.
The examples assume that Cluster1 and c1FileSystem1 are the cluster and file system that you
configured in the topic “Simplified setup: Using SKLM with a self-signed certificate” on page 577. You
configured Cluster1 for encryption and you created a policy that caused all the files in c1FileSystem1 be
encrypted.
To configure Cluster2 with remote access to an encrypted file in Cluster1, you must configure Cluster2
for encryption in much the same way that Cluster1 was configured. As the following table shows,
Cluster2 must add the same key server and tenant as Cluster1. However, Cluster2 must create its own
key client and register it with the tenant.
Note: In the third column of the table, items in square brackets are connected or added during this topic.
The fourth column shows the step in which each item in the third column is added.
Table 54. Setup of Cluster1 and Cluster2
Item Cluster1 Cluster2 Steps
File system c1FileSystem1 [c1FileSystem1_Remote] Step 1
Connected to a key server keyserver01 [keyserver01] Step 2
Connected to a tenant c1Tenant1 on keyserver01 [c1Tenant1 on keyserver01] Step 3
Created a key client c1Client1 [c2Client1] Step 4
Registered the key client to c1Client1 to c1Tenant1 [c2Client1 to c1Tenant1] Step 5
the tenant
The encrypted file hw.enc is in c1FileSystem1 on Cluster1. To configure Cluster2 to have remote access
to file hw.enc, follow these steps:
1. From a node in Cluster2, connect to the remote Cluster1:
a. To set up access to the remote cluster and file system, follow the instructions in topic Chapter 25,
“Accessing a remote GPFS file system,” on page 347.
b. Run the mmremotefs add command to make the remote file system c1FileSystem1 known to the
local cluster, Cluster2:
Note: c1Filesystem1_Remote is the name by which the remote file system c1FileSystem1 is known
to Cluster2.
# mmremotefs add c1FileSystem1_Remote -f c1FileSystem1 -C Cluster1.gpfs.net -T
/c1FileSystem1_Remote -A no
mmremotefs: Propagating the cluster configuration data to all affected nodes.
This is an asynchronous process.
Tue Mar 29 06:38:07 EDT 2016: mmcommon pushSdr_async: mmsdrfs propagation started.
Note: After you have completed Step 1(b) and mounted the remote file system, if you try to
access the contents of file hw.enc from Cluster2, the command fails because the local cluster does
not have the master encryption key for the file:
# cat /c1FileSystem1_Remote/hw.enc
cat: hw.enc: Operation not permitted
mmfs.log:
Tue Mar 29 06:39:27.306 2016: [E]
Key ’KEY-d4e83148-e827-4f54-8e5b-5e1b5cc66de1:keyserver01_devG1’
could not be fetched. The specified RKM ID does not exist;
check the RKM.conf settings.
2. From a node in Cluster2, connect to the same SKLM key server, keyserver01, that Cluster1 is
connected to:
a. Run the mmkeyserv server add to connect to keyserver01:
# mmkeyserv server add keyserver01
Enter password for the key server keyserver01:
The security certificate(s) from keyserver01.gpfs.net must be accepted to continue.
View the certificate(s) to determine whether you want to trust the certifying authority.
Do you want to view or trust the certificate(s)? (view/yes/no) view
For the first three tasks in this topic, you need the password for your SKLM key server.
“Creating encryption keys”
“Adding a tenant”
“Managing another key server” on page 601
“Adding backup key servers” on page 605
The command displays the UUIDs of the previously existing key and the five new keys.
Adding a tenant
A tenant is a container that resides on a key server and contains encryption keys. Before a key client can
request master encryption keys from a key server, you must add a tenant to the key server, create a key
client, and register the key client with the tenant. For more information, see “Simplified setup: Using
SKLM with a self-signed certificate” on page 577.
In some situations, you might need to access more than one tenant on the same key server. For example,
if you have several key clients that you want to use with the same key server, each key client must
register with a different tenant. For more information, see “Simplified setup: Valid and invalid
configurations” on page 593.
This task shows how to add a tenant, register an existing key client with the tenant, and create
encryption keys in the tenant.
1. Add the tenant:
a. Add a tenant devG2 on keyserver01:
# mmkeyserv tenant add devG2 --server keyserver01
Enter password for the key server keyserver01:
b. Verify that the tenant is added. The following command displays all the existing tenants:
# mmkeyserv tenant show
devG1
Key Server: keyserver01.gpfs.net
Registered Client: c1Client1
The command output shows that c1Client1 is registered to both devG1 and the new devG2.
c. Verify the configuration of the RKM stanza. The following command displays all the RKM stanzas:
# mmkeyserv rkm show
keyserver01_devG1 {
type = ISKLM
kmipServerUri = tls://192.168.40.59:5696
keyStore = /var/mmfs/ssl/keyServ/serverKmip.1_keyserver01.c1Client1.1.p12
passphrase = pw_c1Client1
clientCertLabel = label_c1Client1
tenantName = devG1
}
keyserver01_devG2 {
type = ISKLM
kmipServerUri = tls://192.168.40.59:5696
keyStore = /var/mmfs/ssl/keyServ/serverKmip.1_keyserver01.c1Client1.1.p12
passphrase = pw_c1Client1
clientCertLabel = label_c1Client1
tenantName = devG2
}
1. Install and configure IBM Security Key Lifecycle Manager (SKLM). For more information, see the
topic “Simplified setup: Using SKLM with a self-signed certificate” on page 577.
2. Add the key server, keyserver11. If backup key servers are available, you can add them now. You can
have up to five backup key servers.
a. Add keyserver11 and backup key servers keyserver12 and keyserver13. Enter the requested
information when prompted:
# mmkeyserv server add keyserver11 --backup keyserver12,keyserver13
Enter password for the key server keyserver11:
The security certificate(s) from keyserver11.gpfs.net must be accepted to continue.
View the certificate(s) to determine whether you want to trust the certifying authority.
Do you want to view or trust the certificate(s)? (view/yes/no) view
The command shows two key servers, keyserver01 and the keyserver11.
3. Add a tenant to the key server. The name of the tenant must be unique within the same key server,
but it can be the same as the name of a tenant in another key server:
a. Add the tenant devG1 to keyserver11:
mmkeyserv tenant add devG1 --server keyserver11
Enter password for the key server keyserver11:
b. Verify that the tenant is added:
mmkeyserv tenant show
devG1
Key Server: keyserver01.gpfs.net
Registered Client: c1Client1
devG2
Key Server: keyserver01.gpfs.net
Registered Client: c1Client1
devG1
Key Server: keyserver11.gpfs.net
Registered Client: (none)
Note: A key client name must be 1-16 characters in length and must be unique within an IBM
Spectrum Scale cluster.
a. Create c1Client11 on keyserver11.
# mmkeyserv client create c1Client11 --server keyserver11
Enter password for the key server keyserver11:
Create a pass phrase for keystore:
Confirm your pass phrase:
b. Verify that the client is created. The command shows all the existing key clients:
# mmkeyserv client show
c1Client1
Label: c1Client1
Key Server: keyserver01.gpfs.net
Tenants: devG1,devG2
c1Client11
Label: c1Client11
Key Server: keyserver11.gpfs.net
Tenants: (none)
Important: IBM Spectrum Scale does not manage backup key servers. You must configure them and
maintain them.
| Note: For information about using backup key servers, see the subtopic "Adding backup RKM servers in
| a high-availability configuration" in “Preparation for encryption” on page 571.
This task shows how to add backup key servers to the RKM stanza of one of your key clients. You can
add backup key servers when you create a key server, as shown in Step 2 of the previous subtopic. Or
you can add them later, as in this subtopic.
In this task the primary key server is keyserver11. The backup key servers for the RKM stanza are
keyserver12 and keyserver13. You want to add three more backup key servers to the list: keyserver14,
keyserver15, and keyserver16.
Attention:
v You can change the order in which the client tries backup key servers, by running the same
command with the key servers in a different order.
v You can delete backup key servers by specifying a list that contains the backup key servers that
you want to keep and omits the ones that you want to delete.
2. To verify, issue the mmkeyserv rkm show command to display the RKM stanzas:
# mmkeyserv rkm show
keyserver01_devG1 {
type = ISKLM
kmipServerUri = tls://192.168.40.59:5696
keyStore = /var/mmfs/ssl/keyServ/serverKmip.1_keyserver01.c1Client1.1.p12
passphrase = pw4c1Client1
clientCertLabel = c1Client1
tenantName = devG1
}
keyserver01_devG2 {
type = ISKLM
kmipServerUri = tls://192.168.40.59:5696
keyStore = /var/mmfs/ssl/keyServ/serverKmip.1_keyserver01.c1Client1.1.p12
passphrase = pw4c1Client1
clientCertLabel = c1Client1
tenantName = devG2
}
keyserver11_devG1 {
type = ISKLM
kmipServerUri = tls://keyserver11.gpfs.net:5696
kmipServerUri12 = tls://keyserver12.gpfs.net:5696
kmipServerUri13 = tls://keyserver13.gpfs.net:5696
This topic describes the regular setup for setting up encryption with IBM Security Key Lifecycle Manager
(SKLM) as the key management server and using self-signed certificates on the KMIP port of the SKLM
server. If your deployment scenario requires the use of a chain of server certificates from a Certificate
Authority, see the topic “Regular setup: Using SKLM with a certificate chain” on page 614.
Note: If you are using SKLM v2.7 or later, see the topic “Configuring encryption with SKLM v2.7 or
later” on page 623.
Requirements:
The following requirements must be met on every IBM Spectrum Scale node that you configure for
encryption:
v The node must have direct network access to the system where the key server is installed.
v The security-sensitive files that are created during the configuration process must have the following
characteristics:
– They must be regular files that are owned by the root user.
– They must be in the root group.
– They must be readable and writable only by the user (mode '0600'). The following examples apply
to the regular setup and the Vormetric DSM setup:
-rw-------. 1 root root 2446 Mar 20 12:15 /var/mmfs/etc/RKM.conf
drw-------. 2 root root 4096 Mar 20 13:47 /var/mmfs/etc/RKMcerts
-rw-------. 1 root root 3988 Mar 20 13:47 /var/mmfs/etc/RKMcerts/keystore_name.p12
These security-sensitive files includes the following files:
– The RKM.conf file. For more information about this file, see “The RKM.conf file and the RKM
stanza” on page 574.
– The files in the client keystore directory, which include the keystore file, the public and private key
files for the client, and possibly other files. For more information about these files, see “The client
keystore directory and its files” on page 576.
The command returns yes if the cluster complies with FIPS or no if not.
b. On the SKLM server system, open the SKLMConfig.properties file.
Note: The default location of the SKLMConfig.properties file depends on the operating system:
v On AIX, Linux, and similar operating systems:
/opt/IBM/WebSphere/AppServer/products/sklm/config/SKLMConfig.properties
v On Microsoft Windows:
Drive:\Program Files (x86)\IBM\WebSphere\AppServer\products\sklm\config\
SKLMConfig.properties
c. Add or remove the following line from the SKLMConfig.properties file. Add the line to configure
SKLM to comply with FIPS, or remove it to have SKLM not comply with FIPS.
fips=on
4. Configure the SKLM server to have the same NIST SP800-131a (NIST) setting as the IBM Spectrum
Scale cluster. Follow these steps:
a. Determine the NIST setting of the cluster by entering the following command on the command
line:
mmlsconfig nistCompliance
The command returns SP800-131A if the cluster complies with NIST or off if not.
where:
certUUID
Specifies the UUID that you made a note of in the previous substep.
fileName
Specifies the path and file name of the certificate file in which the server certificate is
stored.
SKLM exports the self-signed server certificate into the specified file.
g. Close the SKLM command line interface.
h. Copy the server certificate file to a temporary directory on the IBM Spectrum Scale node that you
are configuring for encryption.
4. In SKLM, create a device group and keys for the IBM Spectrum Scale cluster:
a. In the SKLM graphical user interface, click Advanced Configuration > Device Group.
b. In the Device Group table, click Create.
c. In the Create Device Group window, follow these steps:
1) Select the GPFS device family.
2) Enter an appropriate name, such as GPFS_Tenant0001. The name is case-sensitive.
3) Make a note of the name. You need it in Part 3 when you create an RKM stanza.
4) Complete any other fields and click Create.
d. After SKLM creates the device group, it prompts you to add devices and keys. Do not add any
devices or keys. Instead, click Close. Keys are created in the next step.
5. Create keys for the device group.
a. In the SKLM graphical user interface, in the Key and Device Management table, select the device
group that you created in Step 4. In these instructions the device group is named GPFS_Tenant0001.
b. Click Go to > Manage keys and services.
c. In the management page for GPFS_Tenant0001, click Add > Key.
d. Enter the following information:
v The number of keys to be created
v The three-letter prefix for key names. The key names are internal SKLM names and are not used
for GPFS encryption.
This subtopic describes how to configure a single RKM back end and how to share the configuration
among multiple nodes in a cluster. To configure multiple RKM back ends, see “Part 4: Configuring more
RKM back ends” on page 613
You can do Step 1 and Step 2 on any node of the cluster. In later steps you will copy the configuration
files from Step 1 and Step 2 to other nodes in the cluster.
1. Create and configure a client keystore. Follow these steps:
a. Create the following subdirectory to contain the client keystore:
/var/mmfs/etc/RKMcerts
b. The following command creates the client keystore, stores a private key and a client certificate in
it, and also stores the trusted SKLM server certificate into it. From the command line, enter the
following command on one line:
mmauth gencert --cname clientName --cert serverCertFile --out /var/mmfs/etc/RKMcerts/SKLM.p12
--label clientCertLabel --pwd-file passwordFile
Important: Verify that the files in the client keystore directory meet the requirements for
security-sensitive files that are listed in the Requirements section at the beginning of this topic.
2. Create an RKM.conf file and add a stanza to it that describes a connection between a local key client,
an SKLM device group, and an SKLM key server. Each stanza defines an RKM back end.
a. Create a text file with the following path and name:
/var/mmfs/etc/RKM.conf
Important: Verify that the files in the client keystore directory meet the requirements for
security-sensitive files that are listed in the Requirements section at the beginning of this topic.
b. Add a stanza with the following format:
stanzaName {
type = ISKLM
kmipServerUri = tls://raclette.zurich.ibm.com:5696
where tenantName is the name that you provide in the last line of stanza. For example, the
RKM ID for the key server and key client in these instructions is:
raclette_GPFS_Tenant0001.
type Always ISKLM.
kmipServerUri
The DNS name or IP address of the SKLM server and the KMIP SSL port. You can find
this information on the main page of the SKLM graphic user interface. The default port is
5696.
You can have multiple instances of this line, where the first instance represents the
primary key server and each additional instance represents a backup key server. You can
have up to five backup key servers. The following example has the primary key server
and five backup key servers:
stanzaName {
type = ISKLM
kmipServerUri = tls://raclette.zurich.ibm.com:5696
kmipServerUri2 = tls://raclette.fondue2.ibm.com:5696
kmipServerUri3 = tls://raclette.fondue3.ibm.com:5696
kmipServerUri4 = tls://raclette.fondue4.ibm.com:5696
kmipServerUri5 = tls://raclette.fondue5.ibm.com:5696
kmipServerUri6 = tls://raclette.fondue6.ibm.com:5696
keyStore = /var/mmfs/etc/RKMcerts/SKLM.p12
passphrase = a_password
clientCertLabel = a_label
tenantName = GPFS_Tenant0001
}
If the GPFS daemon cannot get an encryption key from the primary key server, it tries the
backup key servers in order.
| For more information, see the subtopics "RKM back ends" and "Adding backup RKM
| servers in a high-availability configuration" in the topic “Preparation for encryption” on
| page 571.
keyStore
The path and name of the client keystore. You specified this parameter in Step 1.
passphrase
The password of the client keystore and client certificate. You specified this parameter in
Step 1.
clientCertLabel
The label of the client certificate in the client keystore. You specified this parameter in Step
1.
Note: The mmchpolicy command in Step 5 will fail if you omit this step. The mmchpolicy command
requires the configuration files to be on the file system manager node.
a. Copy the RKM.conf file from the /var/mmfs/etc directory to the same directory on the file system
manager node.
b. Copy the keystore files that the RKM file references to the same directories on the file system
manager node. The recommended location for the keystore files is /var/mmfs/etc/RKMcerts/.
Important: Verify that the files in the client keystore directory meet the requirements for
security-sensitive files that are listed in the Requirements section at the beginning of this topic.
4. To configure other nodes in the cluster for encryption, copy the RKM.conf file and the keystore files to
those nodes. Copy the files in the same way as you did in Step 3.
Important: Verify that the files in the client keystore directory meet the requirements for
security-sensitive files that are listed in the Requirements section at the beginning of this topic.
5. Install an encryption policy for the cluster:
Note: You can do this step on any node to which you copied the configuration files.
a. Create a policy that instructs GPFS to do the encryption tasks that you want. The following policy
is an example policy. It instructs IBM Spectrum Scale to encrypt all files in the file system with a
file encryption key (FEK) and to wrap the FEK with a master encryption key (MEK):
RULE 'p1' SET POOL 'system' # one placement rule is required at all times
RULE 'Encrypt all files in file system with rule E1'
SET ENCRYPTION 'E1'
WHERE NAME LIKE '%'
RULE 'simpleEncRule' ENCRYPTION 'E1' IS
ALGO 'DEFAULTNISTSP800131A'
KEYS('KEY-326a1906-be46-4983-a63e-29f005fb3a15:SKLM_srv')
In the last line, the character string within single quotation marks (') is the key name. A key name is
a compound of two parts in the following format:
KeyID:RkmID
where:
KeyID
Specifies the UUID of the key that you created in the SKLM graphic user interface in Part
2.
RkmID
Specifies the name of the RKM backend stanza that you created in the
/var/mmfs/etc/RKM.conf file.
DEFAULT
| Note: In line six of the preceding example, the default parameter DEFAULTNISTSP800131AFAST can
| be substituted for the default parameter DEFAULTNISTSP800131A following the ALGO keyword. The
| two parameters have the same effect as to policy, but the second value provides faster runtime
| performance in certain environments. For more information see “Encryption policy rules” on page
| 566.
b. Install the policy rule with the mmchpolicy command.
CAUTION: Installing a new policy with the mmchpolicy command removes all the statements in
the previous policy. To add statements to an existing policy without deleting the previous
contents, collect all policy statements for the file system into one file. Add the new statements to
the file and install the contents of the file with the mmchpolicy command.
6. Import the client certificate into the SKLM server:
a. On the IBM Spectrum Scale node that you are configuring for encryption, send a KMIP request to
SKLM. To send a KMIP request, try to create an encrypted file on the node. The attempt fails, but
it causes SKLM to put the client certificate in a list of pending certificates in the SKLM key server.
The attempt fails because SKLM does not yet trust the client certificate. See the following example:
# touch /gpfs0/test
touch: cannot touch `/gpfs0/test': Permission denied
# tail -n 2 /var/adm/ras/mmfs.log.latest
Thu Mar 20 14:00:55.029 2014: [E] Unable to open encrypted file: inode 46088,
Fileset fs1, File System gpfs0.
Thu Mar 20 14:00:55.030 2014: [E] Error: key
'KEY-326a1906-be46-4983-a63e-29f005fb3a15:SKLM_srv' could not be fetched (RKM
reported error -1004).
b. In the graphical user interface of SKLM, on the main page, click Pending client device
communication certificates.
c. Find the client certificate in the list and click View.
d. Carefully check that the certificate that you are importing matches the one created in the previous
step, then click Accept and Trust.
e. On the resulting screen, provide a name for the certificate and click Accept and Trust again.
f. On the node that you are configuring for encryption, try to create an encrypted file as you did in
Step (a). This time the command succeeds. Enter an mmlsattr command to list the encryption
attributes of the new file:
# touch /gpfs0/test
# mmlsattr -n gpfs.Encryption /gpfs0/test
file name: /gpfs0/test
gpfs.Encryption: "EAGC????f?????????????? ??????w?^??>???????????? ?L4??
_-???V}f???X????,?G?<sH??0?)??M?????)?KEY-326a1906-be46-4983-a63e-29f005fb3a15?
sklmsrv?)?KEY-6aaa3451-6a0c-4f2e-9f30-d443ff2ac7db?RKMKMIP3?"
EncPar 'AES:256:XTS:FEK:HMACSHA512'
type: wrapped FEK WrpPar 'AES:KWRAP' CmbPar 'XORHMACSHA512'
KEY-326a1906-be46-4983-a63e-29f005fb3a15:sklmsrv
From now on, the encryption policy rule causes each newly created file to be encrypted with a file
encryption key (FEK) that is wrapped in a master encryption key (MEK). You created the key in a
device group in the SKLM server and included its UUID as part of a key name in the security
rule.l1
This topic describes the regular method for setting up encryption with IBM Security Key Lifecycle
Manager (SKLM) as the key management server and with a certificate signed by a certificate authority
(CA) on the KMIP port of the SKLM server. If your deployment scenario uses a self-signed server
certificate, see one of the following topics:
“Simplified setup: Using SKLM with a self-signed certificate” on page 577
“Regular setup: Using SKLM with a self-signed certificate” on page 606
The regular setup with IBM Security Key Lifecycle Manager (SKLM) requires IBM Spectrum Scale
Advanced Edition or IBM Spectrum Scale Data Management Edition V4.1 or later and SKLM V2.5.0.1 or
later (including V2.6).
Note: If you are using SKLM v2.7 or later, see the topic “Configuring encryption with SKLM v2.7 or
later” on page 623.
The IBM Spectrum Scale node that you are configuring for encryption must have direct network access to
the system where the key server is installed.
CAUTION:
It is a good practice to take the following precautions:
v Ensure that the passphrase for the client certificate file is not leaked through other means, such as
the shell history.
v Take appropriate precautions to ensure that the security-sensitive files are not lost or corrupted. IBM
Spectrum Scale does not manage or replicate the files.
Important: The client keystore must be record-locked when the GPFS daemon starts. If the keystore files
are stored on an NFS mount, the encryption initialization process can hang. The cause is a bug that
affects the way NFS handles record locking. If you encounter this problem, upgrade your version of NFS
or store your keystore file on a local file system. If an upgrade is not possible and no local file system is
available, use a RAM drive to store the keystore files.
The command returns yes if the cluster complies with FIPS or no if not.
b. On the SKLM server system, open the SKLMConfig.properties file.
Note: The default location of the SKLMConfig.properties file depends on the operating system:
v On AIX, Linux, and similar operating systems the directory is at the following location:
/opt/IBM/WebSphere/AppServer/products/sklm/config/SKLMConfig.properties
v On Microsoft Windows the directory is at the following location:
Drive:\Program Files (x86)\IBM\WebSphere\AppServer\products\sklm\config\
SKLMConfig.properties
c. Add or remove the following line from the SKLMConfig.properties file. Add the line to configure
SKLM to comply with FIPS, or remove it to have SKLM not comply with FIPS.
fips=on
4. Configure the SKLM server to have the same NIST SP800-131a (NIST) setting as the IBM Spectrum
Scale cluster. Follow these steps:
The command returns SP800-131A if the cluster complies with NIST or off if not.
b. On the SKLM server system, open the SKLMConfig.properties file. For the location of this file, see
the note in Step 3.
c. Add the following line to configure SKLM to comply with NIST or remove it to configure SKLM
not to comply with NIST:
TransportListener.ssl.protocols=TLSv1.2
d. For all V2.5.0.x versions of SKLM, if you are configuring SKLM to comply with NIST, modify the
following variable to include only cipher suites that are approved by NIST. The following
statement is all on one line, with no space before or after the comma:
TransportListener.ssl.ciphersuites=TLS_RSA_WITH_AES_256_CBC_SHA256,
TLS_RSA_WITH_AES_128_CBC_SHA256
5. Configure IBM WebSphere Application Server so that it has the same NIST setting as the IBM
Spectrum Scale cluster. See the topic Transitioning WebSphere Application Server to the SP800-131
security standard in the volume WebSphere Application Server Network Deployment in the WebSphere
Application Server online documentation.
v WebSphere Application Server can be configured to run SP800-131 in a transition mode or a
strict mode. The strict mode is recommended.
v When NIST is enabled, make sure that WebSphere Application Server certificate size is at least 2048
bytes and is signed with SHA256withRSA as described in the preceding link.
6. If the cipher suites were set at any time, SKLM 2.6.0.0 has a known issue that causes server
certificates always to be signed with SHA1withRSA. To work around the problem, follow these steps:
a. While the SKLM server is running, in the SKLMConfig.properties file, modify the
requireSHA2Signatures property as follows:
requireSHA2Signatures=true
b. Do not restart the server.
c. Generate a new server certificate signing request (CSR) to a third-party certificate authority (CA)
and send it to the CA.
d. When you receive the certificate from the third-party CA, import it into SKLM and set it to be the
certificate in use. For more information, see the next subtopic.
e. If you restart the server, you must repeat this workaround before you can create a server
certificate that is signed other than with SHA1withRSA.
Note: For more information about the steps in this subtopic, see the steps that are described in the SKLM
documentation, in the topic "Scenario: Request for a third-party certificate" at http://www.ibm.com/
support/knowledgecenter/en/SSWPVP_2.7.0/com.ibm.sklm.doc/scenarios/cpt/
cpt_ic_scenar_ca_certusage.html.
1. Create a certificate signing request (CSR) with the SKLM command-line interface:
a. On the SKLM server system, open a command-line window.
b.
c. Change to the WAS_HOME/bindirectory. The location of this directory depends on the operating
system:
v On AIX, Linux, and similar operating systems, the directory is at the following location:
where:
-alias labelCsr
Specifies the certificate label of the CSR.
-cn server
Specifies the common name of the server in the certificate.
-validity daysValid
Specifies the validity period of the certificate in days.
-keyStoreName defaultKeyStore
Specifies the keystore name within SKLM where the CSR is stored. Typically, you should
specify defaultKeyStore as the name here.
-fileName fileName
Specifies the fully qualified path of the directory where the CSR is stored on the SKLM
server system, for example /root/sklmServer.csr.
-usage SSLSERVER
Specifies how the generated certificate is used in SKLM.
Important: You must also obtain and copy the intermediate certificate files of the certificate chain of
authority into the same temporary directory. The intermediate certificates might be included with the
generated certificate file, or you might have to obtain the intermediate certificates separately.
Whatever the method, you must have a separate certificate file for the root certificate and for each
intermediate certificate in the chain of authority. You need these certificate files in Part 3.
4. Import the root certificate into the SKLM server with the SKLM graphical user interface:
a. On the Welcome page, in the Action Items section, in the Key Groups and Certificates area, click
You have pending certificates.
b. In the Pending Certificates table, click the certificate that you want to import and click Import.
c. In the File name and location field, type the path and file name of the certificate file and click
Import.
5. In SKLM, create a device group for the IBM Spectrum Scale cluster:
a. In the SKLM graphical user interface, click Advanced Configuration > Device Group.
where:
--prefix /var/mmfs/etc/RKMcerts/SKLM
Specifies the path and file name prefix of the client credential files that are generated.
--cname clientName
Specifies the name of the client in the certificate.
--pwd-file passwordFile
Specifies the path of a text file that contains the password for the client keystore. The
password must be 1 - 20 characters in length.
--fips fipsVal
Specifies the current FIPS 140-2 compliance mode of the IBM Spectrum Scale cluster. Valid
values are on and off. To find the current mode, enter the following command:
mmlsconfig fips1402mode
--nist nistVal
Specifies the current NIST SP 800-131A compliance mode of the IBM Spectrum Scale cluster.
Valid values are on and off. To find the current mode, enter the following command:
where:
--cert /var/mmfs/etc/RKMcerts/SKLM.cert
Specifies the path of the client certificate file. The path was specified in the --prefix
parameter Step 2. The file suffix is .cert.
--priv /var/mmfs/etc/RKMcerts/SKLM.priv
Specifies the path of the client private key file. The path was specified in the --prefix
parameter in the Step 2. The file suffix is .priv.
--label clientCertLabel
Specifies the label of the client certificate in the keystore.
--pwd-file passwordFile
Specifies the path of a text file that contains the password for the client keystore. The
password must be 1 - 20 characters in length.
--out /var/mmfs/etc/RKMcerts/SKLM.p12
Specifies the path of the client keystore.
--fips fipsVal
Specifies the current FIPS 140-2 compliance mode of the IBM Spectrum Scale cluster. Valid
values are on and off. To find the current mode, enter the following command:
mmlsconfig fips1402mode
--nist nistVal
Specifies the current NIST SP 800-131A compliance mode of the IBM Spectrum Scale cluster.
Valid values are on and off. To find the current mode, enter the following command:
mmlsconfig nistCompliance
4. Copy the certificate files of the certificate chain, including the root certificate file, from the temporary
directory to the directory that contains the client keystore. For more information, see Part 2. Rename
each file with the same prefix, with a numeral that indicates the order of the certificate in the chain,
and with the suffix .cert. Start the numbering with 0 for the root certificate. For example, if there are
three files in the chain, and the prefix is sklmChain, rename the files as follows:
sklmChain0.cert
sklmChain1.cert
sklmChain2.cert
5. Enter the following command to verify the certificate chain. The command is all on one line:
openssl verify -CAfile /var/mmfs/etc/RKMcerts/sklmChain0.cert
-untrusted /var/mmfs/etc/RKMcerts/sklmChain1.cert
/var/mmfs/etc/RKMcerts/sklmChain2.cert
where:
where:
--prefix /var/mmfs/etc/RKMcerts/sklmChain
Specifies the path and the file name prefix of the files in the certificate chain. The mmgsskm
command trusts all the files that have the specified prefix and a .cert suffix. For example, if
there are three certificates in the chain and the prefix is /var/mmfs/etc/RKMcerts/
sklmChain, then the command trusts the following certificates:
/var/mmfs/etc/RKMcerts/sklmChain0.cert
/var/mmfs/etc/RKMcerts/sklmChain1.cert
/var/mmfs/etc/RKMcerts/sklmChain2.cert
--pwd-file passwordFile
Specifies the path of a text file that contains the password of the client keystore.
--out /var/mmfs/etc/RKMcerts/SKLM.p12
Specifies the path of the client keystore.
--label labelChain
Specifies the prefix of the label for the server certificate chain in the client keystore.
--fips fipsVal
Specifies the current FIPS 140-2 compliance mode of the IBM Spectrum Scale cluster. Valid
values are on and off. To find the current mode, enter the following command:
mmlsconfig fips1402mode
--nist nistVal
Specifies the current NIST SP 800-131A compliance mode of the IBM Spectrum Scale cluster.
Valid values are on and off. To find the current mode, enter the following command:
mmlsconfig nistCompliance
Important: The new keystore must be record-locked when the GPFS daemon starts. If the keystore
files are stored on an NFS mount, the encryption initialization process can hang. The cause is a bug
that affects the way NFS handles record locking. If you encounter this problem, upgrade your version
of NFS or store your keystore file on a local file system. If an upgrade is not possible and no local file
system is available, use a RAM drive to store the keystore files.
7. Create an RKM.conf file and add a stanza to it that contains the information that is necessary to
connect to the SKLM key server. The RKM.conf file must contain a stanza for each connection between
a key client, an SKLM device group, and a key server.
a. In a text editor, create a new text file with the following path and name:
/var/mmfs/etc/RKM.conf
b. Add a stanza with the following format:
where tenantName is the name that you provide in the last line of stanza. For example, the
RKM ID for the key server and key client in these instructions is:
raclette_GPFS_Tenant0001.
type Always ISKLM.
kmipServerUri
The DNS name or IP address of the SKLM server and the KMIP SSL port. You can find
this information on the main page of the SKLM graphic user interface. The default port is
5696.
You can have multiple instances of this line, where each instance represents a different
backup key server. The following example has the primary key server and two backup key
servers:
stanzaName {
type = ISKLM
kmipServerUri = tls://raclette.zurich.ibm.com:5696
kmipServerUri = tls://raclette.fondue.ibm.com:5696
kmipServerUri = tls://raclette.fondue2.ibm.com:5696
keyStore = /var/mmfs/etc/RKMcerts/SKLM.p12
passphrase = a_password
clientCertLabel = a_label
tenantName = GPFS_Tenant0001
}
If the GPFS daemon cannot get an encryption key from the primary key server, it tries the
backup key servers in order.
keyStore
The path and name of the client keystore.
passphrase
The password of the client keystore and client certificate.
clientCertLabel
The label of the client certificate in the client keystore.
tenantName
The name of the SKLM device group. See “Part 1: Installing Security Key Lifecycle
Manager” on page 615.
8. Set up an encryption policy on the node that you are configuring for encryption.
In the last line of the policy, the character string within single quotation marks (') is the key name.
A key name is a compound of two parts in the following format:
KeyID:RkmID
where:
KeyID Specifies the UUID of the key that you created in the SKLM graphic user interface in Part
2.
RkmID Specifies the name of the RKM backend stanza that you created in the
/var/mmfs/etc/RKM.conf file.
b. Issue the mmchpolicy command to install the rule.
CAUTION:
Installing a new policy with the mmchpolicy command removes all the statements in the
previous policy. To add statements to an existing policy without deleting the previous contents,
collect all policy statements for the file system into one file. Add the new statements to the file
and install the contents of the file with the mmchpolicy command.
9. Import the client certificate into the SKLM server:
a. On the IBM Spectrum Scale node that you are configuring for encryption, send a KMIP request to
SKLM. To send a KMIP request, try to create an encrypted file on the node. The attempt fails, but
it causes SKLM to put the client certificate in a list of pending certificates in the SKLM key server.
The attempt fails because SKLM does not yet trust the client certificate. See the following example:
# touch /gpfs0/test
touch: cannot touch `/gpfs0/test': Permission denied
# tail -n 2 /var/adm/ras/mmfs.log.latest
Thu Mar 20 14:00:55.029 2014: [E] Unable to open encrypted file: inode 46088,
Fileset fs1, File System gpfs0.
Thu Mar 20 14:00:55.030 2014: [E] Error: key
'KEY-326a1906-be46-4983-a63e-29f005fb3a15:SKLM_srv' could not be fetched (RKM
reported error -1004).
b. In the graphical user interface of SKLM, on the main page, click Pending client device
communication certificates.
c. Find the client certificate in the list and click View.
d. Carefully check that the certificate that you are importing matches the one created in the previous
step, then click Accept and Trust.
e. On the resulting screen, provide a name for the certificate and click Accept and Trust again.
f. On the node that you are configuring for encryption, try to create an encrypted file as you did in
Step (a). This time the command succeeds. Enter an mmlsattr command to list the encryption
attributes of the new file:
# touch /gpfs0/test
# mmlsattr -n gpfs.Encryption /gpfs0/test
file name: /gpfs0/test
gpfs.Encryption: "EAGC????f?????????????? ??????w?^??>???????????? ?L4??
_-???V}f???X????,?G?<sH??0?)??M?????)?KEY-326a1906-be46-4983-a63e-29f005fb3a15?
From now on, the encryption policy rule causes each newly created file to be encrypted with a file
encryption key (FEK) that is wrapped in a master encryption key (MEK). You created the key in a
device group in the SKLM server and included its UUID as part of a key name in the security rule.
Important: See the security note and the caution at the beginning of this topic, before Part 1.
To configure a single SKLM server to use one-character Instance IDs, follow these steps.
Note: These instructions use SKLM v2.7 as an example. The procedure with later versions of SKLM is
similar.
1. Stop the SKLM server.
Note: The location of the DB2/bin directory depends on the operating system:
v On AIX, Linux, and similar operating systems, the directory is at the following location:
/opt/IBM/DB2SKLMV27/bin
v On Microsoft Windows, the directory is at the following location:
Drive:\Program Files\IBM\DB2SKLMV27\bin
If SKLM uses a preexisting DB2 installation, then the location of the bin directory might be different
and might be on another system.
3. Start the DB2 command-line tool. The method depends on the operating system:
v On AIX, Linux, and similar operating systems, enter the following command:
./db2
v On Microsoft Windows, enter the following command:
db2
4. At the db2 command-line prompt, enter the following command to list the database directory:
list database directory
DB2 displays output like the following example:
System Database Directory
Number of entries in the directory = 1
Database 1 entry:
where:
database
Specifies the database name from the previous step.
userName
Specifies the SKLM DB2 user name that you set during SKLM installation. The default value
is sklmdb27.
password
Specifies the SKLM DB2 password that you set during SKLM installation.
6. Enter the following command to change the SKLM instance ID. The command is on one line:
update KMT_CFGT_INSTDETAILS set INSTANCEID=’1’ where INSTANCEID in
(select INSTANCEID from KMT_CFGT_INSTDETAILS)
where 1 is the one-character Instance ID that you want to set. DB2 displays output like the following
example:
DB20000I The SQL command completed successfully.
7. Enter the following command to commit the change:
The server now generates UUIDs that have a maximum length of 42 characters.
To configure the SKLM servers in a high-availability (HA) cluster, follow these steps.
1. Stop the SKLM server on all the nodes of the cluster.
2. For each node in the cluster, follow the steps in the preceding subtopic "Resolving the UUID length
problem for a single server". Set the Instance ID to a separate value in each node.
3. Start the SKLM server on all nodes in the cluster.
The servers now generate UUIDs that have a maximum length of 42 characters.
In SKLM v2.6, the default Representational State Transfer (REST) port is 9080. In SKLM v2.7 and later, the
default REST port is 443.
To change the REST port number that IBM Spectrum Scale uses to connect to SKLM to 443, enter the
following command:
mmkeyserv server add --port 443
For more information, see “Simplified setup: Using SKLM with a self-signed certificate” on page 577.
The reason is that these commands are configured with the SKLM v2.5 or v2.6 default REST port, whose
value is different in v2.7 and later. For more information, see the previous subtopic on changing the REST
port.
Note: You do not have to delete and re-create encryption keys. You can use the same encryption keys
in SKLM v7 that you configured in SKLM v2.5 or v2.6.
The IBM Spectrum Scale node that you are configuring for encryption must have direct network access to
the system where the key server is installed.
Note: In the simplified setup, the mmkeyserv command sets the permission bits automatically.
v For the regular setup and the Vormetric DSM setup:
-rw-------. 1 root root 2446 Mar 20 12:15 /var/mmfs/etc/RKM.conf
drw-------. 2 root root 4096 Mar 20 13:47 /var/mmfs/etc/RKMcerts
-rw-------. 1 root root 3988 Mar 20 13:47 /var/mmfs/etc/RKMcerts/keystore_name.p12
Important: The client keystore must be record-locked when the GPFS daemon starts. If the keystore files
are stored on an NFS mount, the encryption initialization process can hang. The cause is a bug that
affects the way NFS handles record locking. If you encounter this problem, upgrade your version of NFS
or store your keystore file on a local file system. If an upgrade is not possible and no local file system is
available, use a RAM drive to store the keystore files.
where:
--prefix prefix
Specifies the path and file name prefix of the directory where the output files are generated.
For example, if you want directory /var/mmfs/etc/RKMcerts to contain the output files, and
you want the output files to have the prefix kcVormetric, you can specify the parameter as
follows:
--prefix /var/mmfs/etc/RKMcerts/kcVormetric
--cname cname
Specifies the name of the IBM Spectrum Scale key client. Valid characters are alphanumeric
characters, hyphen (-), and period (.). The name can be up to 54 characters long. In Vormetric
DSM, names are not case-sensitive, so the use of uppercase letters is not recommended. For
more information, see the Vormetric DSM documentation.
where:
--cert certFile
Specifies the client certificate file that you created in Step 1.
--priv privFile
Species the private key file that you created in Step 1.
--label label
Specifies the label under which the private key is stored in the keystore.
--pwd pwd
Specifies the password of the keystore. You can use the same password that you specified for
the private key in Step 1.
--out keystore
The file name of the keystore.
In the following example, the current directory contains the client credentials from Step 1. The
command is entered on one line:
mmgskkm store --cert kcVormetric.cert --priv kcVormetric.priv --label lapkVormetric
--pwd pwpkVormetric --out ksVormetric.keystore
The output file is a keystore that contains the client credentials of the key client:
ksVormetric.keystore
Important: The keystore must be record-locked when the GPFS daemon starts. If the keystore files are
stored on an NFS mount, the encryption initialization process can hang. The cause is a bug that
affects the way NFS handles record locking. If you encounter this problem, upgrade your version of
NFS or store your keystore file on a local file system. If an upgrade is not possible and no local file
system is available, use a RAM drive to store the keystore files.
3. Retrieve the certificate chain of the Vormetric DSM server.
Note: DSM does not support the use of imported server certificate chains for the TLS communication
on the KMIP port. You must create and use a server certificate chain signed by the DSM internal
certificate authority (CA).
Enter the following command on one line:
/usr/lpp/mmfs/bin/mmsklmconfig restcert --host host --port port --prefix prefix --keystore keystore
--keypass keypass --fips fips --nist nist
where:
--host host
Specifies the name or IP address of the remote system where the Vormetric DSM server is
running.
--port port
Specifies the port on the remote system for communicating with the Vormetric DSM server
(default 8445).
--prefix prefix
Specifies the path and file name prefix of the directory where the files in the certificate chain
are stored. For example, if you want to store the certificate chain in the directory
/var/mmfs/etc/RKMcerts, and you want the certificate files to have the prefix DSMServer, you
can specify the parameter as follows:
--prefix /var/mmfs/etc/RKMcerts/DSMServer
--keystore keystore
Specifies the path and file name of the client keystore that you created in Step 2.
--keypass keypass
Specifies a text file that contains the password of the client keystore as the first line. You must
create this text file. Store the password that you provided in Step 2.
--fips fips
Specifies whether the key client complies with FIPS 140-2. Specify on or off.
--nist nist
Specifies whether the key client complies with NIST SP800-131a. Specify on or off.
In the following example, the current directory contains the client keystore that was created in Step 2.
Enter the command on one line:
/usr/lpp/mmfs/bin/mmsklmconfig restcert --host hostVormetric --port 8445 --prefix DSM
--keystore ksVormetric.keystore --keypass keypass --fips off --nist on
The command connects to the DSM server, retrieves the server certificate chain, and stores each
certificate into a separate local file in Base64-encoded DER format. Each file name has the format
prefixN.cert, where prefix is the prefix that you specified in the command and N is a digit that begins
at 0 and increases by 1 for each certificate in the chain, as in the following example:
DSM0.cert
DSM1.cert
4. Verify that the SHA-256 fingerprint in each retrieved certificate matches the fingerprint of the DSM
server:
a. To display the details of each certificate, enter the following sequence at the client command line,
where prefix is the prefix that you provided in Step 3:
for c in prefix*.cert; do /usr/lpp/mmfs/bin/mmgskkm print --cert $c; done
b. Log in to the graphical user interface of the DSM server and display its SHA-256 fingerprint.
c. Verify that the fingerprints in the certificates match the fingerprint in the DSM server.
where:
--prefix prefix
Specifies the prefix that you specified in Step 3.
--pwd pwd
Specifies the password of the client keystore, which you provided in Step 3.
--out keystore
Specifies the path name of the keystore of the key client.
--label serverLabel
Specifies the label under which the server certificate chain is stored in the client keystore.
--fips fips
Specifies whether the key client complies with FIPS 140-2. Specify on or off.
--nist nist
Specifies whether the key client complies with NIST SP800-131a. Specify on or off.
In the following example, the current directory contains the client keystore and the certificate chain.
Enter the following command on one line:
/usr/lpp/mmfs/bin/mmgskkm trust --prefix DSM --pwd pwpkVormetric --out ksVormetric.keystore
--label laccVormetric --fips off --nist on
In DSM, a host is a system to which DSM provides security services. In these instructions, the host is the
IBM Spectrum Scale node that you are configuring for encryption. A DSM domain is an administrative
group of one or more hosts. In these instructions, the domain contains the single IBM Spectrum Scale
node. For more complex configurations, see the DSM product documentation.
1. Install a Key Management Interoperability Protocol (KMIP)-enabled license in DSM.
Important: You must complete this step before you create a DSM domain. For security reasons, you
cannot create a KMIP-enabled domain in DSM until you install a KMIP-enabled license. For example,
you cannot create a regular domain, install a KMIP-enabled license, and then convert the domain to a
KMIP-enabled domain.
a. On the DSM Management Console, click System > License.
b. Select a KMIP-enabled license that you obtained from DSM.
c. Click Upload License File.
The license is installed.
2. Create a DSM domain.
a. On the DSM Management Console, click Domains > Manage Domains.
b. Follow the instructions to create a domain. Make sure that you configure the domain as KMIP
Supported.
3. Create a Domain and Security Administrator for the new domain.
Note: The passwords are temporary. The new administrator must enter a new password on the
first login to the DSM Management Console.
d. Click OK.
e. Limit the scope of the administrator's control to the domain that you created in Step 2.
4. Add a host to the domain.
a. Log in as the new administrator:
1) Enter a password when prompted.
2) Select I am a local domain administrator.
3) Enter or select the domain name from Step 2.
b. On the Management Console, click Hosts > Hosts.
c. On the Hosts screen, click Add to add a KMIP host. Set the Host Name to the name that you
specified for the key client (the value for the cname parameter) when you created the client
credentials in Part 1. In these instructions, the key client name is kcVormetric.
d. In the list of hosts, select the host that you created in the previous step. Click Import KMIP Cert.
If no Import KMIP Cert button is displayed, verify that the DSM license is KMIP-enabled and that
you created the domain after you installed the KMIP-enabled license.
e. In the window that opens, go through the directories of the IBM Spectrum Scale node to the
directory that contains the client certificate file. Select the certificate file.
5. Create one or more keys for the client to use as master encryption keys (MEKs).
a. From the DSM Management Console, click Keys > Key Templates. Follow the instructions to
create a key template. Select AES256 as the key algorithm.
b. Create a key from the template. Specify a name for the key and then select the template.
c. Make a note of the UUID of the key. You need it in Part 3.
where keyClientName is the key client name from Part 1, Step 1. For example, the RKM ID
for the key server and key client in these instructions is: raclette_kcVormetric.
type Always KMIP for the Vormetric DSM server.
kmipServerUri
The DNS name or IP address of the DSM server and the DSM SSL port. Multiple
kmipServerUri entries may be added for high availability (HA), but note that the DSM
servers must then be configured in an active-active setup. In the regular DSM HA setup,
the passive failover nodes do not serve keys over KMIP. For more information, consult the
Vormetric DSM documentation.
keyStore
The path and name of the client keystore from Part 1.
passphrase
The password of the client keystore and client certificate from Part 1.
clientCertLabel
The label of the client certificate in the client keystore from Part 1.
2. Set up an encryption policy on the node that you are configuring for encryption.
a. Create a policy that instructs GPFS to do the encryption tasks that you want. The following policy
is an example policy. It instructs IBM Spectrum Scale to encrypt all files in the file system with a
file encryption key (FEK) and to wrap the FEK with a master encryption key (MEK):
RULE 'p1' SET POOL 'system' # one placement rule is required at all times
RULE 'Encrypt all files in file system with rule E1'
SET ENCRYPTION 'E1'
WHERE NAME LIKE '%'
RULE 'simpleEncRule' ENCRYPTION 'E1' IS
ALGO 'DEFAULTNISTSP800131A'
KEYS(’01-10:raclette_kcVormetric’)
In the last line, the character string within single quotation marks (') is the key name. A key name is
a compound of two parts in the following format:
KeyID:RkmID
where:
KeyID
Specifies the UUID of the master encryption key that you created in the DSM Management
Console in Part 2.
RkmID
Specifies the name of the RKM stanza that you created in the /var/mmfs/etc/RKM.conf file in
Step 1.
b. Install the policy rule with the mmchpolicy command.
CAUTION:
Installing a new policy with the mmchpolicy command removes all the statements in the
previous policy. To add statements to an existing policy without deleting the previous contents,
collect all policy statements for the file system into one file. Add the new statements to the file
and install the contents of the file with the mmchpolicy command.
From now on, the encryption policy rule causes each newly created file to be encrypted with a file
encryption key (FEK) that is wrapped in a master encryption key (MEK).
| Warnings are issued for both RKM server certificates and key client certificates.
| Note: To renew an expired server or client certificate, see the topic Renewing client and server
| certificates.
| A warning message for an RKM server certificate that is approaching its expiration date contains the date
| and time of expiration and the IP address and port of the RKM server, as in the following example. In the
| log file this message would be printed all on one line:
| 2018-08-01_11:45:09.341-0400: GPFS: 6027-3732 [W] The server certificate for key
| server 192.168.9.135 (port 5696) will expire at Aug 01 12:03:32 2018 EDT (-0400).
| With this information you can log on to the specified RKM server and find the server certificate that is
| approaching expiration.
| A warning message for a key client certificate that is approaching its expiration date contains the date
| and time of the expiration and the IP address and port of the RKM server to which the key client has a
| connection. It does not contain the label of the client certificate. In the log file this message would be
| printed all on one line:
| 2018-08-01_11:45:09.341-0400: GPFS: 6027-3731 [W] The client certificate for key
| server 192.168.9.135 (port 5696) will expire at Aug 01 12:28:13 2018 EDT (-0400).
| The procedure for identifying an expiring client certificate based on the RKM server information in the
| error message depends on two circumstances:
| v Whether more than one key client in the cluster has a connection with the RKM server that is specified
| in the error message.
| The following instructions assume that only one key client in the cluster has a connection with the
| specified RKM server:
| v Simplified method: If the encryption environment is configured by the simplified method, follow these
| steps:
| 1. Make a note of the following information:
| – The expiration date of the client certificate from the warning message.
| – The IP address and port of the RKM server from the error message.
| – The host name of the RKM server that uses that IP address and port. Look this item up in your
| system information.
| 2. On the command line of a node in the cluster, issue the following command to list the key clients
| for the RKM server:
| mmkeyserv client show -server <host_ID>
| where <host_ID> is the IP address or host name of the RKM server from Step 1.
| 3. For each key client the command displays a block of information that includes the client certificate
| label, the host name or IP address and the port of the RKM server, and other information.
| 4. This set of instructions assumes that only one key client in the cluster has a connection with the
| specified RKM server. Therefore, in Step 3 the command displays only one block of information.
| The label that is listed in this block of information is the label of the client certificate that is
| approaching expiration.
| v Regular method: If the encryption environment is configured by the regular method, follow these
| steps:
| 1. Make a note of the following information:
| – The expiration date of the client certificate from the warning message
| – The IP address and port of the RKM server from the error message.
| – The host name of the RKM server that uses that IP address and port. Look this item up in your
| system information.
| 2. On a node of the cluster that accesses encrypted files – that is, on a node that is successfully
| configured for encryption – open the RKM.conf file with a text editor. For more information about
| the RKM.conf file, see the topic “Preparation for encryption” on page 571.
| 3. In the RKM.conf file, follow these steps:
| a. Find the stanza that contains the host name or IP address and the port of the RKM server from
| Step 1. This information is specified in the kmipServerURI parameter of the stanza.
| b. The client certificate label that is specified in that same stanza is the label of the client
| certificate that is approaching expiration.
| c. Make a note of the path of the keystore and the keystore password that are also specified in the
| stanza. You can use this information to open the keystore with a tool such as the openssl
| key-management utility and inspect the certificate.
| If more than one key client in the cluster might have a connection with the RKM server that is specified
| in the error message, then you must identify each such key client and search its keystore to find the
| certificate that is approaching expiration. The following instructions are for both the simplified setup
| method and the regular setup method:
| 1. Make a note of the expiration date of the client certificate and the IP address and port of the RKM
| server in the error message. Also look up the host name of the RKM server.
| 2. List the stanzas of the RKM.conf file:
| v For the simplified setup method, issue the following command from the command line:
| IBM Spectrum Scale checks certificate expiration dates only when the certificates are being used to
| authenticate a connection between a key client and a key server.
| IBM Spectrum Scale checks the certificate expiration dates of a key client and its RKM server at regular
| intervals, currently every 15 minutes. The first check occurs when the key client connects with the server
| to obtain a master encryption key (MEK), which it stores in a local cache on the network node.
| Subsequent checks occur regularly as the key client periodically reconnects with the RKM server so that it
| can refresh the MEK in the local cache. The current refresh interval is 15 minutes.
| IBM Spectrum Scale does not check the certificate expiration dates of client or server certificates that are
| not currently being used in this way. This category includes not-in-use client certificates in local keystores
| and not-in-use server certificates for RKM backup servers.
| Frequency of warnings
| The frequency of warnings increases as the expiration date nears, as the following table illustrates:
| Table 56. Frequency of warnings
| Time before expiration Frequency of warnings
| More than 90 days No warnings are logged.
| 30 - 90 days Every seven days.
| 7 days - 30 days Every 24 hours.
| 24 hours - 7 days Every 60 minutes.
| Less than 24 hours Every 15 minutes.
|
| A first warning is issued when both of the following conditions become true:
| v At least 75 percent of the certificate validity period has passed.
| v The time that remains falls within one of the warning windows.
| Subsequent warnings are issued with the frequency that is listed in the second column of the preceding
| table. For example, if the validity period is 30 days and begins at midnight on March 1, then the
| warnings are issued as shown in the following list:
| First warning: March 22 at 12:00 noon (.75 * 30 days = 22.5 days).
| Second warning: March 23 at 12:00 noon (7.5 days remaining).
| Third warning: March 24 at 12:00 noon (6.5 days remaining).
| Warnings: Every 60 minutes from March 24 at 1:00 PM until March 29 at 12:00 midnight.
| Warnings: Every 15 minutes from March 29 at 12:15 AM until March 30 at midnight.
During encryption, the GPFS daemon acts as a key client and requests master encryption keys (MEKs)
from a Remote Key Management (RKM) server. The supported RKM servers are IBM Security Key
Lifecycle Manager (SKLM) and Vormetric Data Security Manager (DSM).
| When a digital client or server certificate expires, the IBM Spectrum Scale client cannot access encrypted
| files, because it can no longer retrieve MEKs from the RKM server. The following topics describe how to
| recognize certificate expiration errors and how to renew client and server certificates.
| MEKs do not expire unless they are explicitly removed from a key server.
The following table shows the default lifetimes of client and server certificates:
Table 57. Comparing default lifetimes of key server and key client certificates
Item Type of certificate Default lifetime
IBM Spectrum Scale Client 3 years
IBM Security Key Lifecycle Manager Server 3 years
(SKLM)
Vormetric Data Security Manager Server 10 years
(DSM)
When the certificate of an RKM server expires, IBM Spectrum Scale can no longer retrieve master
encryption keys (MEKs) from the server. The result is that attempts to create, open, read, or write
encrypted files fail with an "Operation not permitted" error. Each time that an error occurs, IBM Spectrum
Scale writes error messages like the following ones to the /var/adm/ras/mmfs.log.latest log file:
[W] The key server sklm1 (port 5696) had a failure and will be
quarantined for 1 minute(s).
[E] Unable to create encrypted file testfile.enc (inode 21260,
fileset 0, file system gpfs1).
[E] Key ’KEY-uuid:sklm1’ could not be fetched. Bad certificate.
IBM Spectrum Scale checks the status of a key client certificate each time it loads a keystore. It loads a
keystore whenever a file system is mounted, a new policy is applied, or an RKM.conf configuration file is
explicitly loaded with the tsloadikm run command.
When IBM Spectrum Scale detects an expired client certificate, it writes one or more of the following
error messages to the /var/adm/ras/mmfs.log.latest log file or to the console or to both, depending on
the action that you took when the problem occurred:
[E] Error while validating policy ’policy.enc’: rc=778:
While parsing file ’/var/mmfs/etc/RKM.conf’:
[E] Incorrect client certificate label ’GPFSlabel’ for backend ’sklm2’.
[E] Error while validating policy ’policy.enc’: rc=778:
While parsing file ’/var/mmfs/etc/RKM.conf’:
[E] Certificate with label ’GPFSlabel’ for backend ’sklm2’ has expired.
where <serverName> is the name of the key server object that you want to update.
3. Enter the SKLMAdmin administrator password when prompted.
4. Enter yes to trust the SKLM REST certificate.
The key server object is updated with the self-signed server certificate.
where <serverName> is the name of the key server object that you want to update.
4. Enter the SKLMAdmin administrator password when prompted.
5. Enter yes to trust the SKLM REST certificate.
The IBM Spectrum Scale client now trusts the new SKLM WebSphere Application Server certificate.
where:
--host <sklmhost>
Is the IP address or host name of the RKM server.
--port <kmipport>
Is the KMIP port number of the SKLM server. The default value is 5696.
--prefix <sklmChain>
Is the path and file name prefix where the server certificate files are to be stored.
--keystore <rkmKeystore>
Is the path and file name of the client keystore from Step 1.
--keypass <rkmPassfile>
Is the path and file name of the keystore password file from Step 2.
where sklmChain is the path and file name prefix of the certificate files. You specified this prefix in
Step 3.
5. Issue the following command to add the retrieved server certificate to the client keystore: The mmgskkm
command is available in IBM Spectrum Scale v4.2.1 and later.
mmgskkm trust --prefix <sklmChain> --out <rkmKeystore> --pwd-file <rkmPassfile>
--label <serverLabel>
where:
--prefix <sklmChain>
Is the path and file name prefix of the server certificate files. You specified this prefix in Step
3.
--out <rkmKeystore>
Is the path and file name of the client keystore from Step 1.
--pwd-file <rkmPassfile>
Is the path and file name of the client keystore password file that you created in Step 2.
--label <serverLabel>
Is the label under which to store the server certificate in the client keystore.
Note: The label must be unique in the keystore. In particular, it cannot be the label of the
expired server certificate from the SKLM key server.
6. Copy the updated client keystore file to all the nodes in the IBM Spectrum Scale cluster.
7. Reload the new client keystore by one of the following methods:
v On any administration node in the cluster, run the mmchpolicy command to refresh the current
policy rules. You do not need to repeat this action on other nodes in the cluster.
v On each node of the cluster, unmount and mount the file system.
v In IBM Spectrum Scale v4.2.1 and later, issue the following command on each node of the cluster:
/usr/lpp/mmfs/bin/tsloadikm run
The IBM Spectrum Scale client now trusts the new self-signed SKLM server certificate.
where:
-CAfile <rootCaCert>
Specifies the root certificate file.
-untrusted <intermediateCaCerts>
Specifies the file that contains the intermediate certificates. If the chain has more than one
intermediate certificate, you must combine them into a single file. If the chain has no
intermediate certificates, omit this parameter.
<endpointCert>
Specifies the endpoint certificate file.
For example, if your server certificate chain consists of the three sample files that are listed in Step 3,
issue the following command:
openssl verify -CAfile /root/sklmChain0.cert -untrusted /root/sklmChain1.cert /root/sklmChain2.cert
5. Issue the following command to add the new SKLM server certificate chain to the keystore. The
mmgskkm command is available in IBM Spectrum Scale v4.2.1 and later.
mmgskkm trust --prefix <sklmChain> --out <keystore> --pwd-file <pwd-file>
--label <serverLabel>
Note: The label must be unique in the keystore. Also, it cannot be the label of the expired
server certificate from the SKLM key server.
6. Copy the updated client keystore to all nodes in the IBM Spectrum Scale cluster.
7. Reload the new client keystore by one of the following methods:
v On any administration node in the cluster, run the mmchpolicy command to refresh the current
policy rules. You do not need to repeat this action on other nodes in the cluster.
v On each node of the cluster, unmount and mount the file system.
v In IBM Spectrum Scale v4.2.1 and later, issue the following command on each node of the cluster:
/usr/lpp/mmfs/bin/tsloadikm run
The IBM Spectrum Scale client now trusts the new SKLM server certificate chain.
where:
--host <dsmhost>
Is the IP address or host name of the DSM server.
--port <dsmport>
Is the port number of the DSM web GUI. The default value is 8445.
--prefix <sklmChain>
Is the path and file name prefix where the server certificate files are to be stored.
DSM server certificate chain: The DSM server certificate chain typically consists of two certificates, a
DSM internal root CA certificate and an endpoint certificate. The names of certificate files that you
retrieve in this step have the following format: the path and file name prefix that you specify in the
--prefix parameter, followed by a 0 for the root certificate or a 1 for the endpoint certificate, followed
by the suffix .cert. In the following example, the prefix is /root/dsmChain:
/root/dsmChain0.cert
/root/dsmChain1.cert
4. Optionally, print the contents of the retrieved server certificate files and verify that the information
matches the information in the new server certificate on the DSM server. The mmgskkm command is
available in IBM Spectrum Scale v4.2.1 and later. Issue the following commands:
mmgskkm print --cert <dsmChain>0.cert
mmgskkm print --cert <dsmChain>1.cert
where dsmChain is the path and file name prefix of the certificate files that you retrieved in Step 3.
5. Issue the following command to add the new DSM server certificate chain to the client keystore. The
mmgskkm command is available in IBM Spectrum Scale v4.2.1 and later.
mmgskkm trust --prefix <dsmChain> --out <rkmKeystore> --pwd-file <rkmPassfile>
--label <serverLabel>
where:
--prefix <dsmChain>
Is the path and file name prefix of the certificate chain files that you retrieved in Step 3, such
as /root/dsmChain.
--out <rkmKeystore>
Is the path and file name of the client keystore from Step 1.
--pwd-file <rkmPassfile>
Is the path and file name prefix of the keystore password file that you created in Step 2.
--label <serverLabel>
Is the label under which to store the server certificate in the client keystore.
Note: The label must be unique in the keystore. Also, it cannot be the label of the expired
server certificate from the DSM key server.
6. Copy the updated client keystore to all nodes in the IBM Spectrum Scale cluster.
7. Reload the new client keystore by one of the following methods:
v On any administration node in the cluster, run the mmchpolicy command to refresh the current
policy rules. You do not need to repeat this action on other nodes in the cluster.
v On each node of the cluster, unmount and mount the file system
The IBM Spectrum Scale client now trusts the new self-signed Vormetric DSM server certificate.
c1Client1
Label: c1Client1
Key Server: keyserver01
Tenants: (none)
3. Optionally, issue the following command and make a note of the RKM ID that is associated with the
old key client. If you reuse the RKM ID of the old key client when you register the new key client,
then you do not have to update any of your encryption policy rules that specify the RKM ID:
# mmkeyserv tenant show
devG1
Key Server: keyserver01.gpfs.net
Registered Client: c1Client0
RKM ID: keyserver01_devG1
See Step 5.
4. Issue the following command to deregister the current key client from the tenant. Notice that this
command also deletes the expired certificate:
# mmkeyserv client deregister c1Client0 --tenant devG1
Enter password for the key server:
Enter password for the key server of client c1Client0:
mmkeyserv: Deleting the following KMIP certificate with label:
15826749741870337947_devG1_1498047851
Note: Here you can specify the RKM ID of the old key client to avoid having to update encryption
policy rules that reference that RKM ID. See Step 3.
# mmkeyserv client register c1Client1 --tenant devG1 --rkm-id keyserver01_devG1
Enter password for the key server:
mmkeyserv: [I] Client currently does not have access to the key.
Continue the registration process ...
mmkeyserv: Successfully accepted client certificate
For more information, see the topic mmkeyserv command in the IBM Spectrum Scale: Command and
Programming Reference.
where:
--prefix <prefix>
Is the path and file name prefix of the new certificate files and keystore file.
--cname <cname>
Is the name of the new IBM Spectrum Scale key client. The name can be up to 54 characters in
length and can contain alphanumeric characters, hyphen (-), and period (.). In Vormetric DSM,
names are not case-sensitive, so it is a good practice not to include uppercase letters.
--fips <fips>
Is the current value of the FIPS1402mode configuration variable in IBM Spectrum Scale. Valid
values are yes and no. Issue the following command to see the current value:
mmlsconfig FIPS1402mode
--nist <nist>
Is the current value of the nistCompliance configuration variable in IBM Spectrum Scale. Valid
values are SP800-131A and off. To see the current value, issue the following command:
mmlsconfig nistCompliance
--days <validdays>
Is the number of days that you want the client certificate to be valid.
--keylen <keylen>
Is the length in bits that you want for the RSA key that is generated.
3. Issue the following command to create a PKCS#12 keystore and to store the client certificate and
private key into it. The mmgskkm command is available in IBM Spectrum Scale v4.2.1 and later.
mmgskkm store --cert <certFile> --priv <privFile> --label <label>
--pwd-file <pwd-file> --out <keystore>
where:
where:
--host <rkmHost>
Is the IP address or host name of the RKM server.
--port <rkmPort>
Is the port of the RKM server:
v For SKLM, the port is the KMIP port, which has a default value of 5696.
v For DSM, the port is the web GUI port, which has a default value of 8445.
--prefix <serverPrefix>
Is the path and file name prefix for the RKM certificate chain.
--keystore <keystore>
Is the path and file name of the PKCS#12 keystore that you created in Step 3.
--keypass <pwd-file>
Is the path and file name of the keystore password file that you created in Step 1.
--fips <fips>
Is the current value of the FIPS1402mode configuration variable in IBM Spectrum Scale. Valid
values are yes and no. Issue the following command to see the current value:
mmlsconfig FIPS1402mode
--nist <nist>
Is the current value of the nistCompliance configuration variable in IBM Spectrum Scale.
Valid values are SP800-131A and off. To see the current value, issue the following command:
mmlsconfig nistCompliance
5. Optionally, print the contents of the server certificate file and verify that the information matches the
information that is displayed for the current server certificate in the RKM GUI. The mmgskkm
command is available in IBM Spectrum Scale v4.2.1 and later. You might need to print more than
one server certificate file:
mmgskkm print --cert <serverPrefix>0.cert
mmgskkm print --cert <serverPrefix>1.cert
where serverPrefix is the path and file name prefix of the certificate chain that you specified in Step
4.
where:
--prefix <serverPrefix>
Is the path and file name prefix for the RKM certificate chain that you retrieved in Step 4.
--out<keystore>
Is the path and file name of the client keystore that you created in Step 3.
--pwd-file<pwd-file>
Is the path and file name of the keystore password file that you created in Step 1.
--label<serverLabel>
Is the label under which you want to store the RKM certificate chain in the client keystore.
7. Update the RKM stanza for the new client credentials in the /var/mmfs/etc/RKM.conf file. Make sure
that the following values are correct:
v The keyStore term specifies the path and file name of the client keystore that you created in Step
3.
v The passphrase term specifies the keystore password from Step 1.
v The clientCertLabel term specifies the label of the new client certificate from Step 3.
8. Copy the updated /var/mmfs/etc/RKM.conf file and the new client keystore file to all the nodes of
the cluster.
9. Reload the new client keystore by one of the following methods:
v On any administration node in the cluster, run the mmchpolicy command to refresh the current
policy rules. You do not need to repeat this action on other nodes in the cluster.
v On each node of the cluster, unmount and mount the file system.
v In IBM Spectrum Scale v4.2.1 and later, issue the following command:
/usr/lpp/mmfs/bin/tsloadikm run
This action ensures that subsequent reads and writes to files use the new client credentials.
Follow these instructions if you are using SKLM and the Regular setup method and you have created
and installed new client credentials.
1. Add the new client certificate to the SKLM list of pending certificates:
a. On the node that you are configuring for encryption, try to create an encrypted file by doing some
action that triggers an encryption policy rule.
b. The attempt fails because SKLM does not yet trust the new client certificate. However, the attempt
causes SKLM to add the new client certificate to the list of pending certificates in the SKLM key
server.
Follow these instructions if that you are using a Vormetric Data Security Manager (DSM) key server and
the Regular setup method and you have created a new client certificate and imported its information into
the current IBM Spectrum Scale policy rules.
1. In the DSM web GUI, import the new client certificate into the DSM server. Provide the path and file
name of the certificate file that you created in Step 2 and referenced in Step 3 of the subtopic “All
other scenarios: Creating and installing new client credentials” on page 644. The path and file name
have the format <prefix>.cert, where <prefix> is the path and file name prefix that you specified in
Step 2.
2. On the node that you are configuring for encryption, try to create an encrypted file by doing some
action that triggers an encryption policy rule. The file is successfully created.
| Encryption hints
| Find useful hints for working with file encryption.
| To test whether a file is encrypted by IBM Spectrum Scale, do one of the following actions:
| v In a policy, use the following condition:
| XATTR(’gpfs.Encryption’) IS NOT NULL
| The command displays output as shown in the following example. The line of the output that begins
| with the label Encrypted indicates whether the file is encrypted:
| #mmlsattr -L textReport
| name: textReport
| metadata replication: 1 max 2
| data replication: 1 max 2
| immutable: no
| appendOnly: no
| flags:
| storage pool name: system
| fileset name: root
| snapshot name:
| creation time: Tue Jun 12 15:40:30 2018
| Misc attributes: ARCHIVE
| Encrypted: yes
| For more information, see the topic mmlsattr command in the IBM Spectrum Scale: Command and
| Programming Reference.
|
Secure deletion
Secure deletion refers to both erasing files from the file system and erasing the MEKs that wrapped the
FEKs that were used to encrypt the files.
After files have been removed from a fileset using standard file system operations (such as unlink and
rm), the tenant administrator might decide to securely delete them. For example, suppose that until that
point, the FEKs of all files in the fileset were encrypted with the MEK with key name KEY-old:isklmsrv.
To cause the secure deletion of all removed files, the administrator must perform the following steps:
1. Create a new MEK and note its key name (in this example, KEY-new:isklmsrv).
2. Modify the appropriate encryption policy KEYS statement in the encryption policy to encrypt new
files with the new MEK (for example, KEY-new:isklmsrv) instead of the old one (KEY-old:isklmsrv).
3. Create and apply a migration (rewrapping) policy (CHANGE ENCRYPTION KEYS) to scan all files,
unwrap the wrapped FEK entries of files that have been wrapped with the old key
(KEY-old:isklmsrv), and rewrap them with the new key (KEY-new:isklmsrv); this step ensures that the
FEKs of existing files will be accessible in the future.
| Tip: The mmapplypolicy command always begins by scanning all of the files in the affected file system
| or fileset to discover files that meet the criteria of the policy rule. In this example, the criterion is
| whether the file is encrypted with a FEK that is wrapped with the MEK KEY-old:isklmsrv. If your file
| system or fileset is very large, you might want to delay running mmapplypolicy until a time when the
| system is not running a heavy load of applications. For more information, see the topic “Phase one:
| Selecting candidate files” on page 391.
4. Remove the old key, KEY-old:isklmsrv. This step commits the secure deletion of all files that were
previously unlinked (and whose FEKs had therefore not been rewrapped with the new MEK,
KEY-new:isklmsrv).
5. On each node that has ever done I/O to a file encrypted with the old key (KEY-old:isklmsrv), run the
following command:
/usr/lpp/mmfs/bin/tsctl encKeyCachePurge ’KEY-old:isklmsrv’
Note: The mmdelfs command will not perform any secure deletion of the files in the file system to be
deleted. mmdelfs only removes all the structures for the specified file system. To securely delete files, you
need to perform the following steps:
1. Identify all MEKs currently used to wrap the FEKs of files in the file system to be deleted. If this
information is not available through other means, obtain it by doing the following:
a. Invoke mmlsattr -n gpfs.Encryption on all files of the file system.
b. Parse the resulting output to extract all the distinct key names of the MEKs that are used.
Note: These are the possible ways that an MEK might be in use in a file system:
a. The MEK is, or was at some point, specified in an encryption rule in the policy set on the file
system.
b. An FEK rewrap has been run, rewrapping an FEK with another MEK.
2. Determine whether the identified MEKs were used to wrap FEKs in other file systems.
WARNING: If the same MEKs were used to wrap FEKs in other file systems, deleting those MEKs
will result in irreparable data loss in the other file systems where those MEKs are used. Before
deleting such MEKs from the key servers, you must create one or more new MEKs and rewrap the
files in the other file systems.
3. After appropriately handling any MEKs that were used to wrap FEKs in other file systems (as
explained in the warning), delete the identified MEKs from their RKMs.
The key servers that store the MEKs know how to manage and securely delete keys. After an MEK is
gone, all files whose FEKs were encrypted with that MEK are no longer accessible. Even if the data
blocks corresponding to the deleted files are retrieved, the contents of the file can no longer be
reconstructed, since the data cannot be decrypted.
However, if the MEKs have been cached for performance reasons (so that they do not have to be fetched
from the server each time a file is created or accessed), the MEKs must also be purged from the cache to
complete the secure deletion.
You can use the following command to purge a given key from the key cache, or to clean the entire
cache, of an individual node:
/usr/lpp/mmfs/bin/tsctl encKeyCachePurge {Key | all}
where:
Key
is the key ID, specified with the KeyId:RkmId syntax.
all
specifies that the entire key cache is to be cleaned.
The scope of this command is limited to the local node and must be run on all nodes that have accessed
the MEKs you are purging in order to ensure secure deletion.
The FIPS1402mode configuration variable controls whether the use of crypto-based security mechanisms
(if they are to be used at all, per the IBM Spectrum Scale administrator) is to be provided by software
modules that are certified according to the requirements and standards described by the Federal
Information Processing Standards (FIPS) 140 Publication Series. When in FIPS 140-2 mode, IBM Spectrum
Scale uses the FIPS 140-2 approved cryptographic provider IBM Crypto for C (ICC) (certificate 2420) for
cryptography. The certificate is listed on the NIST website.
The value of FIPS1402mode can be changed with the mmchconfig command. The default value for this
variable is no. With FIPS1402mode=no, Linux nodes will use kernel encryption modules for direct I/O. If
a cluster is configured with FIPS1402mode=yes, Linux nodes whose kernels are not running in FIPS
mode will see a performance degradation when using direct I/O. The GPFS daemon on the node must be
restarted in order for the new setting to take place.
Note: In IBM Spectrum Scale V4.2.0 and earlier, in a Power 8, little-endian environment, the setting
FIPS1402mode=no is required for the following operations:
v File encryption
v Secure communications between nodes. For more information, see the following descriptions in the
IBM Spectrum Scale: Command and Programming Reference:
– -l CipherList parameter of the mmauth command
– cipherList parameter of the mmchconfig command
v CCR enablement. For more information, see the following descriptions in the IBM Spectrum Scale:
Command and Programming Reference:
– --ccr-enable parameter of the mmchcluster command
– --ccr-enable parameter of the mmcrcluster command.
The mechanisms that are used by file encryption, including ciphers and key lengths, are compliant with
the NIST SP800-131A recommendations. See NIST Special Publication 800-131A, Revision 1 at
http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-131Ar1.pdf.
The global snapshot restore operation restores encrypted files and their FEKs and MEKs. For more
information, see the topic mmrestorefs command in the IBM Spectrum Scale Command and Programming
Reference.
As snapshots are taken of a file system or fileset that includes encrypted files, subsequent operations on
the active files and snapshots depend on the continuing availability of the MEKs for those files.
Over time, some MEKs might no longer be accessible. For example, MEKs can be deleted from the server
as a result of secure deletion. Similarly, encrypted files might be moved to a different key server and have
their FEKs rewrapped with MEKs from the new server, possibly resulting in the old server being
decommissioned.
All snapshots that include encrypted files whose MEKs will no longer be accessible must be deleted with
the mmdelsnapshot command before the current MEKs become unavailable. Otherwise, the corresponding
snapshots will no longer be able to be removed, as is the case of the active files whose keys are no longer
available.
By default, IBM Spectrum Scale does not allow cleartext from encrypted files to be copied into an LROC
device. The reason is that a security exposure arises when cleartext from an encrypted file is copied into
an LROC device. Because LROC device storage is non-volatile, an attacker can capture the cleartext by
removing the LROC device from the system and reading the cleartext at some other location.
To enable cleartext from an encrypted file to be copied into an LROC device, you can issue the
mmchconfig command with the attribute LROCEnableStoringClearText=yes. You might choose this option
if you have configured your system in some way to remove the security exposure. One such method is to
install an LROC device that internally encrypts data that is written into it and decrypts data that is read
from it. But see the following warning.
Warning: If you allow cleartext from an encrypted file to be copied into an LROC device, you must take
steps to protect the cleartext while it is in LROC storage. One method is to install an LROC storage
device that internally encrypts data that is written into it and decrypts data that is read from it. However,
be aware that a device of this type voids the IBM Spectrum Scale secure deletion guarantee, because IBM
Spectrum Scale does not manage the encryption key for the device.
| Whenever encrypted files on a IBM Spectrum Scale file system are migrated to an external storage pool,
| they are decrypted before migration to the external storage pool takes place. Files are sent to the tool that
| manages the external storage in cleartext, leaving file stubs in the file system. When these migrated files
| are recalled, they are retrieved in cleartext and are subsequently re-encrypted by IBM Spectrum Scale as
| they are rewritten to disk. Typically the product software that manages the external storage provides the
| means to encrypt the cleartext data sent by IBM Spectrum Scale before writing the data to the external
| storage. Similarly the product software can decrypt the data before sending it to IBM Spectrum Scale
| when the file is recalled.
| When the stub files that are created from the migration of data to an external pool are copied to other
| locations in the file system, IBM Spectrum Scale recalls the data from the external pool if the destination
| of the copy is a different file (inode) space. For example, copying a stub file from one file system to
| another or from one independent fileset to another triggers the recall of the file data from the external
| pool. If the placement policy for the destination of the file copy requires files to be encrypted, then the
| file also is encrypted when recalled.
| For more information about external pools, see “External storage pools” on page 369.
For encryption requirements, see the topic “Preparation for encryption” on page 571.
Warning: If you allow cleartext from an encrypted file to be copied into an LROC, you must take steps
to protect the cleartext while it is in LROC storage.
During system setup, an initial self-signed certificate is created to use for secure connections between the
GUI web servers and web browsers. Based on the security requirements for your system, you can create
either a new self-signed certificate or install a signed certificate that is created by the certifying authority.
Self-signed certificates can generate web browser security warnings and might not comply with
organizational security guidelines.
The trusted certificates are created by a third-party certificate authority. These certificate authorities
ensure that certificates have the required security level for an organization based on purchase agreements.
Trusted certificates usually have higher security controls for encryption of data and do not cause browser
security warnings. Trusted certificates are also stored in the Liberty profile SSL keystore.
Major web browsers trust the CA-certified certificates by default and therefore they can confirm that the
certificate received by the GUI server can be trusted. You can either buy a signed certificate from a
trusted third-party authority or create your own certificate and get it certified. You can use both
self-signed and trusted certificates. However, using a trusted is the preferred way because the browser
trusts this certificate automatically without any manual interventions.
You can either use the Services > GUI page in the GUI or CLI to install and use the certificates.
You can use the Services > GUI page in the GUI to perform the following tasks:
1. Generate a self-signed certificate by using the Install Self-Signed Certificate option.
2. Generate a certificate and install it after getting it certified by the CA by using the Create Certificate
Request option.
3. Install an already issued certificate by using the Import Certificate option.
4. View the details of the certificate that is applied on the local GUI node by using the View Certificate
option.
You need to perform the following steps to obtain and import a signed-certificate from a trusted
certificate authority:
1. Generate a private key by issuing the following command:
openssl genrsa -out <nameOfYourKey>.key 2048
2. Generate the certificate request as shown in the following example:
openssl req -new -key <nameOfYourKey>.key -out <nameOfYourKey>.csr
The system prompts you to enter the following details:
Country Name (2 letter code) [XX]:
State or Province Name (full name) []:
Locality Name (eg, city) [Default City]:
Organization Name (eg, company) [Default Company Ltd]:
Organizational Unit Name (eg, section) []:
Common Name (eg, your name or your server’s hostname) []:
The secured data access by clients through protocols is achieved through the following two steps:
1. Establishing secured connection between the IBM Spectrum Scale system and the authentication
server.
When the client raises an authentication request to access the data, the IBM Spectrum Scale system
interacts with the external authentication servers like Active Directory or LDAP based on the
authentication configuration. You can configure the security services like TLS and Kerberos with the
external authentication server to secure the communication channel between the IBM Spectrum Scale
system and the external authentication server.
2. Securing the data transfer.
The actual data access wherein the data transfer is made secured with the security features that are
available with the protocol that you use to access the data.
The following diagram depicts the data in transit security implementation in the IBM Spectrum Scale
system.
External authentication
server
TLS
NFSV4/SMB/
Object encryption
bl1ins078
Spectrum Scale
Client
protocol node
Secured connection between the IBM Spectrum Scale system and the
authentication server
You can configure the following authentication servers to configure file and object access:
v Microsoft Active Directory (AD)
v Lightweight Directory Access Protocol (LDAP)
v Keystone
AD and LDAP can be used as the authentication server for both file and object access. Configuring the
Keystone server is a mandatory requirement for the object access to function. The keystone needs to
interact with the authentication server to resolve the authentication requests. You can configure either an
internal or external keystone server for object access. The following table lists the security features that
are used to secure the corresponding authentication server.
Table 58. Security features that are used to secure authentication server.
Authentication server Supported protocols Security features
Active Directory File and Object Kerberos for file and TLS for object.
The secured data transfer over the network is based on the security features available with the protocols
that are used to access the data.
SMB protocol version 3 and later has the following capabilities to provide tighter security for the data
transfers:
1. Secured dialect negotiation
2. Improved signing
3. Secured transmission
The dialect negotiation is used to identify the highest level dialect both server and client can support. The
system administrator can enable SMB encryption by using the smb encrypt setting at the export level. The
following three modes are available for the secured SMB access:
v Automatic
v Mandatory
v Disabled
When the SMB services are enabled, the SMB encryption is enabled in the automatic mode by default.
Note: SMB supports per-export encryption, which allows the administrators to selectively enable or
disable encryption per SMB share.
The IBM Spectrum Scale system provides access to the Object Storage with the help of OpenStack
Keystone Identity Service. The Keystone server that is provided by IBM Spectrum Scale is recommended
to be used only for IBM Spectrum Scale Object workload.
For secure communication between the clients and the IBM Spectrum Scale Object, the system
administrator needs to configure HAProxy for SSL termination, traffic encryption, and load balancing of
the requests to IBM Spectrum Scale Object. The HAProxy needs to be set up on an external system that is
not a part of the IBM Spectrum Scale cluster. For more information on how to configure HAProxy, see the
documentation of the corresponding Linux distribution that you selected.
Securing AD server
To secure the AD server that is used for file access, configure it with Kerberos and to secure AD used for
object access, configure it with TLS.
In the AD-based authentication for file access, Kerberos is configured by default. The following steps
provide an example on how to configure TLS with AD, while it is used for object access.
1. Ensure that the CA certificate for AD server is placed under /var/mmfs/tmp directory with the name
object_ldap_cacert.pem, specifically on the protocol node where the command is run. Perform validation
of CA cert availability with desired name at required location as shown in the following example:
Note: The value that you specify for --servers must match the value in the TLS certificate. Otherwise
the command fails.
3. To verify the authentication configuration, use the mmuserauth service list command as shown in
the following example:
# mmuserauth service list
FILE access not configured
PARAMETERS VALUES
-------------------------------------------------
Provide examples of how to configure LDAP with TLS and Kerberos to secure the LDAP server when it
is used for file and object access.
1. To configure LDAP with TLS and Kerberos as the authentication method for file access, issue the
mmuserauth service create command as shown in the following example:
# mmuserauth service create --type ldap --data-access-method file
--servers es-pune-host-01 --base-dn dc=example,dc=com
--user-name cn=manager,dc=example,dc=com
--netbios-name ess --enable-server-tls --enable-kerberos
--kerberos-server es-pune-host-01 --kerberos-realm example.com
To verify the authentication configuration, use the mmuserauth service list command as shown in
the following example:
# mmuserauth service list
To verify the authentication configuration, use the mmuserauth service list command as shown in
the following example:
# mmuserauth service list
.....
.....
7. Issue the mmnfs export list command with krb5i option to see the authentication and data integrity
configuration.
# mmnfs export list --nfsdefs /ibm/gpfs0/krb5i
The system displays output similar to this:
Path Delegations Clients Access_Type Protocols Transports Squash Anonymous_uid Anonymous_gid SecType PrivilegedPort Export_id DefaultDelegation Manage_Gids NFS_Commit
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
/ibm/gpfs0/krb5i none * RW 3,4 TCP NO_ROOT_SQUASH -2 -2 KRB5I FALSE 3 none FALSE FALSE
8. Issue the mmnfs export list command with krb5p option to see the authentication and privacy
configuration.
# mmnfs export list --nfsdefs /ibm/gpfs0/krb5p
The system displays output similar to this:
You can either enable or disable encryption of the data in transit by using the mmsmb export add
command as shown in the following example:
# mmsmb export add secured_export /ibm/gpfs0/secured_export --option "smb encrypt=mandatory"
When you use Transparent cloud tiering, there are maintenance tasks that must be done. It is
recommended that you put them in a scheduler to make sure that the maintenance activity is performed.
Since these maintenance activities affect overall data throughput, schedule them one at time (do not
schedule them simultaneously) during non-peak demand times.
1. Reconcile your files once a month to make sure that the cloud directories that are maintained by
Transparent cloud tiering services and the file system are synchronized.
2. Do a full backup of the cloud directory once a month to allow for faster and cleaner handling of
disaster recovery and service problems.
3. Run the cloud destroy utility to remove files that are deleted from the file system from the cloud
system.
These steps must be run for each Transparent cloud tiering container in the file system.
Note: You do not have to perform these actions for inactive containers that are not being migrated to
(and have no delete activity).
See the information below for detailed instructions on how to perform these maintenance steps.
After a cloud account is configured, you can apply an ILM policy file to configure a cloud storage tier.
The policy configuration is done by using IBM Spectrum Scale standard ILM policy query language
statements.
For more information on ILM policies, see Chapter 26, “Information lifecycle management for IBM
Spectrum Scale,” on page 363 .
You must create a policy and then apply this policy on the Cloud services node for the ILM-based
migration and recall to work for the cloud storage tier.
Note: Administrators must consider appropriate high and low disk utilization threshold values that are
applicable in the data center environment.
A sample policy rule and the steps to apply the policy on a node are as follows:
/* Sample policy.rules file for using Gateway functionality */
/* Define an external pool for the off-line storage */
define(
exclude_list,
(
FALSE
OR PATH_NAME LIKE ’%/.mcstore/%’
For more information on how to work with the external storage pools and related policies, see “Working
with external storage pools” on page 403.
Note: Ensure that only a single instance of the policy is applied to migrate data to the external cloud
storage pool. This avoids any potential locking issues that might arise due to multiple policy instances
that try to work on the same set of files.
To ensure proper invocation of the policy on reaching threshold limits, see Threshold based migration
using callbacks example.
In the sample policy, the ‘OpenRead’ & ‘OpenWrite’ rule sections represent the transparent recall of a
migrated or non-resident file. Transparent cloud tiering software adds its own extended attributes
(dmapi.MCEA) to each file it processes. Displacement 5 in the extended attributes indicate the resident
state of the file. If it is ‘N’ (non-resident), the policy issues a recall request to bring back the data from the
cloud storage to the local file system for the requested Read or Write operation.
To apply a threshold policy to a file system, see “Using thresholds to migrate data between pools” on
page 400.
IBM Spectrum Scale also gives administrators a way to define policies to identify the files for migration,
and apply those policies immediately using the mmapplypolicy command. This is different from the
threshold-based policies (which are applied by using the mmchpolicy command). The Transparent cloud
tiering service currently does not support parallelism in migrating files simultaneously, but parallelism in
the mmapplypolicy command can be used to improve the overall throughput. Additionally, parallelism can
be achieved by using an ILM policy to migrate data or by driving separate, parallel CLI commands.
where,
v gpfs0 indicates the IBM Spectrum Scale system
v -m indicates the number of threads created and dispatched during policy execution phase. Use the
mmcloudgateway command configuration tuning settings to set your migrate or recall thread counts.
Note: You must know the number of processors that are available on your Transparent cloud tiering
service node.
v -B indicates the maximum number of files passed to each invocation of the EXEC script specified in the
<rules.file>
Note: These two parameters (-m and -B) can be adjusted to improve the performance of large scale
migrations.
The following sample policies are available in the package in the /opt/ibm/MCStore/samples folder:
Table 59. Sample policy list
No Policy Name Description
1 cloudDestroy.policy.template Apply this policy for manually destroying orphaned
cloud objects before retention time expires.
2 coresidentMigrate.template Apply this policy for migrating files in the co-resident
state, so that applications do not need to frequently
recall files.
3 coResidenttoResident.template Apply this policy if you want to convert all "co-resident"
files in a file system to "resident".
CoresToNonres.sobar.template This is used during SOBAR restore to update the
extended attributes (EAs) of co-resident files to
non-resident, so that we could recall them on SOBAR
restored site. Not required to be used outside SOBAR.
exportfiles.policy.template This is used to show how to export files from a given
path. similar to migrateFromDirectory.template. Can be
used by customers.
4 listMigratedFiles.template This policy will list all co-resident and resident files in
the file system.
5 migrateFromDirectory.policy.template This policy migrates all files in a specified directory to
the cloud storage tier.
migrateToSpecificCloudService.policy.template This is used to show how to use a particular cloud
service, to migrate files via policy to a particular cloud
tier. Can be used by customers.
6 recallFromCloud.policy.template Apply this policy to recall files from the cloud storage
tier.
7 thresholdBasedMigration.policy.template Apply this policy to automatically migrate files from the
file system to the cloud storage upon reaching certain
threshold levels.
8 thumbnailTransparentRecall.policy.template This policy will help you display the thumbnails when
files are listed in tools such as Windows Explorer.
9 transparentRecall.policy.template Transparent recall pulls files from the cloud when they
are accessed by an application (read or write).
Chapter 39. Cloud services: Transparent cloud tiering and Cloud data sharing 669
Migrating files to the cloud storage tier
This topic provides a brief description on how to migrate files to the cloud storage tier by using
Transparent cloud tiering.
Note: Before you try to migrate files to the cloud storage tier, ensure that your cloud service
configuration is completed as summarized in Chapter 6, “Configuring and tuning your system for Cloud
services,” on page 55.
You can trigger migration of files from your file system to an external cloud storage tier either
transparently or manually. Transparent migration is based on the policies that are applied on a file
system. Data is automatically moved from the system pool to the configured external storage tier when
the system pool reaches a certain threshold level. A file can be automatically migrated to or from cloud
storage pool based on some characteristics of the file such as age, size, last access time, path.
Alternatively, the user can manually migrate specific files or file sets to a cloud storage pool. For more
information on policy-based migration, see “Applying a policy on a Transparent cloud tiering node” on
page 667.
The state of the file becomes Non-resident after it is successfully migrated to the cloud storage tier.
If you want to migrate files in the co-resident state where the file has been copied to the cloud but also
allows the data to be retained on the file system, see “Pre-migrating files to the cloud storage tier.”
command. For more information on manually migrating files to the cloud storage tier, see mmcloudgateway
command in the IBM Spectrum Scale: Command and Programming Reference.
Normally, when a file is migrated to the cloud storage tier, the status of the file becomes non-resident.
This means that, you need to completely recall the file from the cloud to be able to perform any read or
write operation. This might be an issue when the data is warm. The calling application must recall the
file every time it needs to perform any operation on the file, and this can be resource-intensive. The
solution to this issue is, migrating files in co-resident state or pre-migration.
In this type of migration, irrespective of the file size, when the files are warm, they are archived to the
cloud storage in the co-resident state. That allows applications to have continued access to the files
without issuing any recall commands, but at the same time, ensures that data is at least available on the
cloud if there is any type of disaster. As data gets colder, the files are migrated in the non-resident state.
Since files are available both in the file system and on the cloud, the storage utilization is more here.
Small files whose data resides entirely in the inodes are migrated in the co-resident state even if they are
cold.
Note: When files are pushed through NFS that need to be moved immediately to the cloud tier, be aware
that CES NFS keeps NFSv3 files open and keeps them open indefinitely for performance reasons. Any
files that are cached in this manner will not be migrated to by Transparent cloud tiering to the cloud tier.
NVSv4 client caching is more measured and less likely to prevent files from being migrated to the cloud
tier and is recommended for this sort of usage.
You can migrate files in the co-resident state by using a policy as well as the CLI.
To verify that the file is migrated in the co-resident state, issue the following command:
mmcloudgateway files list file1
For transparent migration in the co-resident state, the following policy has to be applied to the files by
using the mmchpolicy command. A sample policy is available here: /opt/ibm/MCStore/samples/
coresidentMigrate.template:
/*******************************************************************************
* Licensed Materials - Property of IBM
*
* OCO Source Materials
*
* (C) Copyright IBM Corp. 2017 All Rights Reserved
*
* The source code for this program is not published or other-
* wise divested of its trade secrets, irrespective of what has
* been deposited with the U.S. Copyright Office.
*******************************************************************************/
define(
exclude_list,
(
FALSE
OR PATH_NAME LIKE ’%/.mcstore/%’
OR PATH_NAME LIKE ’%/.mcstore.bak/%’
)
)
/* Define premigrate pool, where files are migrated in co-resident state. This represent files moved
to cloud but also available locally on Scale file system.
* It is to be used for warmer data, as that data needs to be available locally on Scale file system
too, to avoid cloud round trips.
*/
RULE EXTERNAL POOL ’premigrate’ EXEC ’/usr/lpp/mmfs/bin/mmcloudgateway files’ OPTS ’--co-resident-state -F’
/* Define migrate pool, where files are migrated in non-resident state. This represent files are moved
to cloud and are not available locally.
* It is to be used for colder data depending on file size. Larger colder files are made non-resident,
where as smaller files (less than 4K) are kept co-resident.
*/
RULE EXTERNAL POOL ’migrate’ EXEC ’/usr/lpp/mmfs/bin/mmcloudgateway files’ OPTS ’-F’
/* This rule defines movement of warm data. Each file (irrespective of it’s size) is moved to cloud
in a co-resident state.
* It means, file is available on the cloud and, access to it is possible from the hot-standby site
if needed.
* Here the sample time interval to indicate warm data is, data that is not accessed between 10
to 30 days.
* We don’t want to pick up HOT data that is being accessed in last 10 days.
* Another advantage of this co-resident migration is when data eventually gets colder, since it
is already migrated to cloud, only file truncation happens later.
*/
RULE ’MoveWarmData’ MIGRATE FROM POOL ’system’
THRESHOLD(0,0)
TO POOL ’premigrate’
WHERE NOT(exclude_list) AND
(CURRENT_TIMESTAMP - ACCESS_TIME > INTERVAL ’10’ DAYS) AND
(CURRENT_TIMESTAMP - ACCESS_TIME < INTERVAL ’30’ DAYS)
/* This rule defines movement of large files that are cold. Here, files that are above 4KB in size
are made non-resident to save
* space on Scale file system. For files that are smaller than 4KB are anyway stored in inode
block itself.
Chapter 39. Cloud services: Transparent cloud tiering and Cloud data sharing 671
*/
RULE ’MoveLargeColdData’ MIGRATE FROM POOL ’system’
THRESHOLD(0,0)
TO POOL ’migrate’
WHERE(KB_ALLOCATED > 4) AND NOT(exclude_list) AND
(CURRENT_TIMESTAMP - ACCESS_TIME > INTERVAL ’30’ DAYS)
/* This rule defines movement of smaller files that are cold. Here, files that are less than
4KB in size are made co-resident, as
* there is no saving in moving these files, as data resides within the inode block, and not
on disk. It avoids un-necessary recall cycles.
*/
RULE ’MoveSmallColdData’ MIGRATE FROM POOL ’system’
THRESHOLD(0,0)
TO POOL ’premigrate’
WHERE(KB_ALLOCATED < 4) AND NOT(exclude_list) AND
(CURRENT_TIMESTAMP - ACCESS_TIME > INTERVAL ’30’ DAYS)
You can trigger recall of files from the cloud storage tier either transparently or manually. Transparent
recall is based on the policies that are applied on a file system. Data is automatically moved from the
cloud storage tier to the system pool when the system pool reaches a certain threshold level. A file can be
automatically recalled from the cloud storage tier based on some characteristics of the file such as age,
size, last access time, path. Alternatively, the user can manually recall specific files or file sets. For more
information on policy-based recall, see “Applying a policy on a Transparent cloud tiering node” on page
667.
You can enable or disable transparent recall for a container when a container pair set is created. For more
information, see “Binding your file system or fileset to the Cloud service by creating a container pair set”
on page 63.
Note: Like recalls in IBM Spectrum Archive and IBM Spectrum Protect for Space Management (HSM),
Transparent cloud tiering recall would be filling the file with uncompressed data and the user would
need to re-compress it by using mmrestripefs or mmrestripefile if so desired. Since we are positioning
compression feature for cold data currently, the fact of a file being recalled means the file is no longer
cold and leaving the file uncompressed would allow better performance for active files.
The state of the file becomes Co-resident after it is successfully recalled. If the file that has been recalled
no longer needs to be kept in the cloud tier it can be deleted. For more information on deleting a file in
the co-resident state, see “Cleaning up files transferred to the cloud storage tier” on page 674.
For more information on manually recalling files, see mmcloudgateway command in the IBM Spectrum
Scale: Command and Programming Reference.
Reconciling files between IBM Spectrum Scale file system and cloud
storage tier
This topic describes how to reconcile files that are migrated between IBM Spectrum Scale file systems and
the cloud tier. The reconcile function runs automatically as part of maintenance activities. While it is
possible to run reconcile from the CLI, it is generally not necessary to do so.
Note: To run reconcile on a given Transparent cloud tiering managed file system, ensure that enough
storage capacity is available temporarily under the root file system, to allow policy scan of a file system.
The purpose of reconcile to is ensure that the cloud database is aligned properly with the IBM Spectrum
Scale file system on state of files that have been tiered to the cloud. Such discrepancies can take place due
to power outages and other such failures. It is recommended that this command be run every couple of
months. This command needs to be run on every container pair. It should not be run in parallel with
other maintenance commands like full cloud database backup but should be run in parallel with other
maintenance commands (or migration policies) that affect that particular container. Also, this command
should not be run while a policy migrate is being run.
There is another reason that you may want to run reconcile. Although there is a policy currently in place
to automatically delete files in the cloud that have been deleted in the file system and similar support for
older versions of files, that support is not fully guaranteed to remove a file. When for legal reasons or
when there is a critical need to know for sure that a file has been deleted from the cloud, it is
recommended that you run the reconcile command as shown below.
For example:
mmcloudgateway files reconcile --container-pair-set-name MyContainer gpfs_Container
...
Wed Nov 15 14:13:15 EST 2017 Processed 92617766 entries out of 92617766.
Wed Nov 15 14:13:19 EST 2017 Reconcile found 228866 files that had been
migrated and were not in the directory.
Wed Nov 15 14:13:19 EST 2017 Reconcile detected 0 deleted files that were
deleted more than 30 days ago.
Wed Nov 15 14:13:19 EST 2017 Reconcile detected 12 migrated files that have
been deleted from the local file system, but have not been deleted from object
storage because they are waiting for their retention policy time to expire.
Wed Nov 15 14:13:19 EST 2017 Please use the ’mmcloudgateway files cloudList’
command to view the progress of the deletion of the cloud objects.
Wed Nov 15 14:13:21 EST 2017 Reconcile successfully finished.
mmcloudgateway: Command completed.
gpfs_Container is the device name of the file system that is associated with the node class, and
MyContainer is the container where the cloud objects are stored.
You can delete files from cloud storage by using the deletion policy manager. However, you can also
guarantee deletion by using a reconcile to manage the mandatory deletions. For example, if a migrated
file is removed from the file system, a reconcile guarantees removal of the corresponding cloud objects
and references that are contained in the cloud directory. Additionally, if multiple versions of a file are
stored on the cloud, reconcile removes all older cloud versions (keeping the most recent). For example, if
Chapter 39. Cloud services: Transparent cloud tiering and Cloud data sharing 673
a file is migrated, then updated, and migrated again. In this case, two versions of the file are stored on
the cloud. Reconcile removes the older version from the cloud. Reconcile also deletes cloud objects that
are no longer referenced.
Note: Reconcile removes entries from the cloud directory that references deleted file system objects.
Therefore, it is recommended that you restore any files that must be restored before you run a reconcile.
It is also recommended to run the reconciliation operation as a background activity during low load on
the Transparent cloud tiering service nodes.
To clean up files on cloud storage that have already been deleted from IBM Spectrum Scale, see “Deleting
cloud objects.”
To do basic cleanup of objects that are transferred to the cloud object storage by using Transparent cloud
tiering, issue a command according to this syntax:
mmcloudgateway files delete
{-delete-local-file | -recall-cloud-file |
--require-local-file} [--keep-last-cloud-file]
[--] File [File ...]
where,
v --recall-cloud-file: When this option is specified, the files are recalled from the cloud storage before
deleting them on the cloud. The status of the local files becomes resident after the operation.
v --delete-local-file: This option deletes both local files and the corresponding cloud object. There is no
recall here.
v --keep-last-cloud-file: This option deletes all the versions of the file except the last one from the cloud.
For example, if a file has three versions on the cloud, then versions 1 and 2 are deleted and version 3 is
retained.
v --require-local-file: This option removes the extended attribute from a co-resident file and makes it
resident, without deleting the corresponding cloud objects. The option requires the file data to be
present on the file system and will not work on a non-resident file.
v --File: This option can be used to process a file list similar to the one generated by the ILM policy.
The mmcloudgateway files delete command accepts files in GPFS file system as an input.
You can delete files by using mmcloudgateway files delete command or by using external commands
such as rm. With any of these commands, the files are only deleted from the local file system, but the
corresponding cloud objects are just marked for deletion. These marked objects are retained on the cloud
for 30 days, by default. You can however modify the retention time by issuing the mmcloudgateway config
set command. Once the retention period expires, the marked files are permanently deleted from the
cloud storage tier.
It is recommended that you apply the destroy policy described below because of how file deletion works.
For example, when you delete files by using external commands, the cloud objects are immediately
674 IBM Spectrum Scale 5.0.2: Administration Guide
marked for deletion only if you have applied the destroy policy to the file system by using the
mmchpolicy command. If the destroy policy is not applied, the cloud objects are marked for deletion only
when you run the reconcile operation. The destroy policy is available here: /opt/ibm/MCStore/samples/
cloudDestroy.policy.template. Additionally, it is recommended to apply the destroy policy along with
other policies such as transparent recall and migration.
If you want to permanently delete the marked files before the retention time expires, you can use the
mmcloudgateway files destroy command.
For example, to set the retention period of the cloud objects to 60 days, issue the following command:
mmcloudgateway config set --cloud-retention-period-days 60
You can permanently delete the cloud objects that are marked for deletion from the cloud automatically
by using the destroy policy or the reconcile command - but you must do this only after 60 days of
marking. If you want to delete these objects earlier than 60 days (for example, 30 days), specify the
following command:
mmcloudgateway files destroy --cloud-retention-period-days 30 --container-pair-set-name container-1
--filesystem-path /gpfs/myfold
Out of all the files marked for deletion, the command deletes cloud objects that were marked for deletion
30 days or earlier. The cloud objects that were marked for deletion less than 30 days are retained.
For more information on the destroy options, see the mmcloudgateway man page.
Cloud services manage reversioned files just as how it manages the deleted files. The cloud destroy
utility automatically deletes older versions depending on the retention time associated with each version.
For example, on day #1, you create a file and migrate it to the cloud. This is version 1. The retention time
is NOT associated with this file yet as there are no other versions of this file. On day #2, you recall the
file, modify it, and migrate it back to the cloud. This is version 2. As soon as this version is created, the
retention period is applicable to the previous version (version 1). On day #6, you again recall the file,
modify it, and migrate it back to the cloud. This is version 3. Once this version is created, retention
period is applicable to the previous version (version 2). Now, you have a total of 3 versions of the file on
your cloud storage.
The following sequence of events occurs, assuming the retention period to be 30 days:
v On day #30, a total of 3 versions of the file are there on the cloud
v On day #31, a total of 3 versions of the file are there on the cloud
v On day #32, a total of 3 versions of the file are there on the cloud
v On day #33, version 1 is deleted as it exhausts the retention time.
v On day #34, version 2 and 3 are there on the cloud
v On day #35, version 2 and 3 are there on the cloud
v On day #36, version 2 and 3 are there on the cloud
v On day #37, version 2 is deleted as it exhausts the retention time.
v On day #38, version 3 is there on the cloud
Note: There is not a separate retention policy for managing the reversioned files versus deleted files. The
number of days retained is the same for both as they both rely on the same policy value.
Chapter 39. Cloud services: Transparent cloud tiering and Cloud data sharing 675
Listing files migrated to the cloud storage tier
Even if the files are deleted from the file system after migration, you can generate a list of files that are
migrated to the cloud storage tier. By using the file names, you can use the mmcloudgateway restore
option to retrieve the files back from the cloud storage tier.
To list the files that are migrated to the cloud, issue a command according to this syntax:
mmcloudgateway files cloudList {--path Path [--recursive [--depth Depth]] [--file File] |
--file-versions File |
--files-usage --path Path [--depth Depth] |
--reconcile-status --path Path |
--path Path --start YYYY-MM-DD[-HH:mm] --end YYYY-MM-DD[-HH:mm]}
Note: You can specify --reconcile-status only if one reconcile is running at a time. (You can run
multiple reconciles in parallel, but the progress indication has this limitation.)
For example, to list all files in the current directory, issue this command:
mmcloudgateway files cloudList --path /gpfs0/folder1
To list all files in all directories under the current directory, issue this command:
mmcloudgateway files cloudList --path /gpfs0/folder1 --recursive
To find all files named myfile in all directories under the current directory, issue this command:
mmcloudgateway files cloudList --path /gpfs0/folder1 --file myfile
To find all files named myfile in the current directory, issue this command:
mmcloudgateway files cloudList --path /gpfs0/folder1 --depth 0 --file myfile
To display information about all versions of file myfile in current directory, issue this command:
mmcloudgateway files cloudList --file-versions myfile
Restoring files
This topic provides a brief description on how to restore files that have been migrated to the cloud
storage tier if the original files are deleted from the GFPS file system.
This option provides a non-optimized (emergency) support for manually restoring files that have been
migrated to the cloud storage tier if the original stub files on the GPFS file system are deleted.
Note: Transparent cloud tiering does not save off the IBM Spectrum Scale directory and associated
metadata such as ACLs. If you want to save off your directory structure, you need use something other
than Transparent cloud tiering.
Before restoring files, you must identify and list the files that need to be restored by issuing the
mmcloudgateway files cloudList command.
Assume that the file, afile, is deleted from the file system but is present on the cloud, and you want to
find out what versions of this file are there on the cloud. To do so, issue the following command:
mmcloudgateway files cloudList --file-versions /gpfs0/afile
You can use the output of the cloudList command for restoring files. For more information on the
cloudList command, see “Listing files migrated to the cloud storage tier.”
By using this command, you can restore files in two different ways. That is, the files to be restored along
with their options can be either specified at the command line, or in a separate file provided by the -F
option.
If you want to specify the options in a file, create a file with the following information (one option per
line):
filename=<name of the file to be retrieved>
target=<full path to restore the file>
id=<This is the unique ID that is given to each version the file (This information is available
in the cloudList output>. If the id is not given, then the latest version of the file will
be retrieved.
The following example shows how the content needs to be provided in a file (for example,
filestoberestored) for restoring a single file /gpfs0/afile with multiple versions:
# Restoring filename /gpfs0/afile
filename=/gpfs0/afile
target=/gpfs0/afile-33
id=33
%%
filename=/gpfs0/afile
target=/gpfs0/afile-34
id=34
%%
# Restoring filename /gpfs0/afile
filename=/gpfs0/afile
target=/gpfs0/afile-35
id=35
%%
# Restoring filename /gpfs0/afile
filename=/gpfs0/afile
target=/gpfs0/afile-latest
%%
# Restoring filename /gpfs0/afile
filename=/gpfs0/afile
The following example shows how the content needs to be provided in a file (for example,
filestoberestored) for restoring the latest version of multiple files (file1, file2, and file3):
# Restoring filename /gpfs0/file1, /gpf0/file1, and /gpfs0/file3
filename=/gpfs0/file1
target=/gpfs0/file1
%%
filename=/gpfs0/file2
target=/gpfs0/file2
%%
filename=/gpfs0/file3
target=/gpfs0/file3
Files to be restored are separated by lines with %% and # represents the comments.
Now that you have created a file with all required options, you need to pass this file as input to the
mmcloudgateway files restore command, as follows:
mmcloudgateway files restore -F filestoberestored
Chapter 39. Cloud services: Transparent cloud tiering and Cloud data sharing 677
Note: It is advised not to run the delete policy if there is some doubt that the retention policy might
result in deleting of the file before you can restore it.
For information on the description of the parameters, see the mmcloudgateway command in IBM Spectrum
Scale: Command and Programming Reference.
To restore the configuration data and save it to the CCR, issue a command according to the following
syntax:
mmcloudgateway service restoreConfig --backup-config-file <name of the tar file>
For example, issue the following command to restore configuration data from the file,
tct_config_backup_20170915_085741.tar:
mmcloudgateway service restoreConfig --backup-config-file tct_config_backup_20170915_085741.tar
Note: During the restore operation, Transparent cloud tiering servers are restarted on all the nodes.
To check the integrity of database that is associated with a container, issue a command according to the
following syntax:
mmcloudgateway files checkDB --container-pair-set-name ContainerPairSetName
For example, issue the following command to check the integrity of the database that is associated with
the container, container1:
mmcloudgateway files checkDB --container-pair-set-name container1
Note: Cloud services configurations contain sensitive security-related information regarding encryption
credentials, so you must store your configuration back-up in a secure location. This configuration
information is critical in helping you restore your system data, if necessary.
You need to perform a recovery of the database when Transparent cloud tiering produces any of the
following messages in the logs:
v The cloud directory database for file system /dev/gpfs0 could not be found. Manual recovery is
necessary.
v The directory service for file system /dev/gpfs0 is not ready for use. Manual recovery is necessary.
v The cloud directory database for file system /dev/gpfs0 is corrupted. Manual recovery is necessary.
where,
filesystem is the device name of the file system whose database is corrupted and which is in need of
manual recovery.
--container-pair-set-name is the name of the container associated with the file system or fileset.
For example, if you want to recover the database associated with the file system, /dev/gpfs0 and the
container, container-1, issue this command:
mmcloudgateway files rebuildDB --container-pair-set container-1 /dev/gpfs0
Note: It is important that background maintenance be disabled when running this command. For more
information, see the Planning for maintenance activities topic in the IBM Spectrum Scale: Concepts, Planning,
and Installation Guide.
Overview
This section provides a brief introduction to SOBAR and the step-by-step instructions for backup and
restore by using SOBAR.
Chapter 39. Cloud services: Transparent cloud tiering and Cloud data sharing 679
Scale out backup and restore (SOBAR) for Cloud services is a method of disaster recovery or service
migration that uses existing GPFS or SOBAR commands along with Cloud services scripts to back up
configuration and file system metadata on one cluster and restore this on a recovery cluster using one
sharing container pair set per node class.
Note: SOBAR works on file system boundaries, but with the Cloud services scripts, this procedure
should work whether you have configured Cloud services by file system or by fileset.
Note: This procedure is designed only for data tiered to object storage by Cloud services. All other data
needs to be backed up some other way.
Primary site
1. Allocate space in object storage for the backup: Create one sharing container pair set per Cloud
services node class that is shared between the primary and recovery clusters.
v This is used to export configuration data and metadata from the primary cluster to cloud and
import to the recovery cluster.
2. Allocate space in the associated file system for backup: Create a global file system directory to
handle the temporary space requirements of the SOBAR backup.
3. File system configuration backup: Back up the configuration of each file system associated with
Cloud services on the primary site. If you have defined Cloud services by file set, then specify the file
systems that those file sets are in. Back up the configuration of each file system on the primary site.
a. Securely transfer these files to a safe place
b. Use these to recreate compatible recovery-site file systems
4. File system metadata backup: Back up the Cloud services inode/metadata for file systems from a
Cloud services node on the primary site using mcstore_backup.sh. If you have defined Cloud services
by file set, then specify the file systems that those file sets are in.
v This script automatically uploads the backup to the sharing container pair set on the cloud that you
created earlier.
5. Cloud services configuration backup. Back up the Cloud services configuration data from a Cloud
services node on the primary site by using the mmcloudgateway service backupConfig command.
v Securely transfer the resulting file to a safe place.
For detailed backup instructions, see “Procedure for backup” on page 683.
Recovery site
1. Recovery site hardware and configuration preparation: Prepare the recovery site to accommodate all
the file systems that you want to recover from the primary:
v Each file system on the recovery site must have at least as much capacity as its corresponding
primary-site file system. These file systems must be created, but not mounted.
Note: You need the full space for the entire file system even if you are restoring just file set subsets
- per SOBAR.
v If you do not already have these file systems created, then you can wait until after running the
mmrestoreconfig command in the subsequent step. The output generated by the mmrestoreconfig
command offers guidance on how to create the recovery file system.
2. Allocate temporary restore staging space for the file system backup image: It is recommended to
use a separate dedicated file system.
3. File system configuration restore: Restore the policies of each file system, fileset definitions, etc.
Note: You can recall offline files from the cloud (both manually and transparently) on the restore site
only. Trying to recall offline files, migrated from the primary site, using a recall policy template does not
work, because the restore site cluster does not recognize these files to be part of an external pool.
However, files once migrated from the restore site can be recalled in bulk using a recall policy.
This topic describes the preparations that must be done at the primary site.
Chapter 39. Cloud services: Transparent cloud tiering and Cloud data sharing 681
Note: This will automatically provide a buffer since the actual file will be compressed (tar) and
the final size will depend on the compression ratio).
6. Allocate space in associated file system for backup: This is a global file system directory that is
allocated to handle the temporary space requirements of the SOBAR backup.
v Use standard GPFS methodology or the install toolkit to allocate storage for this file system:
– Common GPFS Principles: “Common GPFS command principles” on page 109.
– Performing additional tasks using the installation toolkit: See Performing additional tasks using the
installation toolkit topic in the IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
This topic describes the preparation steps that must be done at the secondary (restore) site.
Note: Each file system on the on the recovery site will need at least as much capacity as its
corresponding primary-site file system (Actual file system creation will take place in a later step).
a. On the primary cluster, use the mmdf command for each file system to determine the required
amount of space necessary for the matching recovery site file system (look for total blocks in the
second column).
b. If it is necessary to determine sizes for separated metadata and data disks, look for the
corresponding information on the primary site (look for data and metadata distribution in the
second column). For example,
mmdf gpfs_tctbill1 | egrep ’(data)|(metadata)|failure|fragments|total|- ----- -|=====’
disk disk size failure holds holds free in KB free in KB
name in KB group metadata data in full blocks in fragments
--------------- ------------- -------- -------- ----- -------------------- -------------------
(pool total) 15011648512 13535170560 ( 90%) 53986848 ( 0%)
============= ==================== ===================
(data) 12889330688 12675219456 ( 98%) 53886584 ( 0%)
(metadata) 2122317824 859951104 ( 41%) 100264 ( 0%)
============= ==================== ===================
(total) 15011648512 13535170560 ( 90%) 53986848 ( 0%)
Note: NSD details are filtered out, these are displayed in 1 KB blocks (use '--block-size auto' to
show in human readable format).
c. Use the previous information as a guide for allocating NSDs on the recovery site and preparing
stanza files for each file system.
Note: It is preferable to have the same number, size, and type of NSD for each file system on the
recovery site as on the primary site, however it is not a requirement. This simply makes the
auto-generated stanza file easier to modify in the recovery portion of this process.
3. Ensure that there are no preexisting Cloud services node classes on the recovery site, and that the
node classes that you create are clean and unused.
For more information, see “Designating the Cloud services nodes” on page 55.
7. Ensure that there is no active Cloud services configuration on the recovery site.
8. If this is an actual disaster, and you are transferring ownership of the Cloud services to the recovery
cluster, ensure that all write activity is suspended from the primary site while the recovery site has
ownership of Cloud services.
Note: Make sure the <global-filesystem-directory> you choose is mounted, accessible from all
nodes in the cluster, and has enough space to accommodate the backup.
The following is an example and a sample output:
[root@primary-site-tct-node scripts]# /opt/ibm/MCStore/scripts/ mcstore_sobar_backup.sh
gpfs_tctbill1c powerleSOBAR1 TCTNodeClassPowerLE /ibm/gpfs_tctbill1
Creating backup for File System : gpfs_tctbill1
TOTAL_USED_INODE_SPACE 1261342920704
...
mmimgbackup: [I] Image backup of /dev/gpfs_tctbill1 begins at Wed Mar 14 17:03:05 EDT 2018.
Chapter 39. Cloud services: Transparent cloud tiering and Cloud data sharing 683
...
mmimgbackup: [I] Image backup of /dev/gpfs_tctbill1 ends at Wed Mar 14 22:37:55 EDT 2018.
...
Exporting SOBAR backup: 9277128909880390775_gpfs_tctbill1_03-14-18-17-03-01.tar to cloud
and Data Container is : powerleSOBAR1
...
Completed backup procedure for File System : gpfs_tctbill1 use
9277128909880390775_gpfs_tctbill1_03-14-18-17-03-01.tar for restore operation
b. Repeat the mcstore_sobar_backup.sh command for each file system you are backing up, using the
same sharing_container_pair_set_name and the same global-filesystem-directory and the
tct_node-class-names where appropriate.
3. Cloud services configuration backup
a. Issue this command: mmcloudgateway service backupConfig --backup-file <backup_file>. For
example,
[root@primary-site-tct-node ~]# mmcloudgateway service backupConfig --backup-file
/temp/TCT_backupConfig
Before you begin, ensure that the prerequisites are met and the preparation steps for the recovery site are
performed. For more information, see “Prerequisites for the recovery site” on page 682.
Note: If NSD servers are used, then transfer the backups to one of them.
[root@primary-site ~]# scp powerleBillionBack_gpfs_tctbill1_02232018
root@recovery-site-nsd-server-node:/temp/ powerleBillionBack_gpfs_tctbill1_02232018
scp powerleBillionBack_gpfs_tctbill3_02232018 root@recovery-site-nsd-server-node:/temp/
powerleBillionBack_gpfs_tctbill1DB_02232018
For example,
[root@ recovery-site-nsd-server-node ~]# mmrestoreconfig gpfs_tctbill1 -i
/roggr/powerleBillionBack_gpfs_tctbill1_02232018 -F
./powerleBillionRestore_gpfs_tctbill1_02232018
mmrestoreconfig: Configuration file successfully created in
./powerleBillionRestore_gpfs_tctbill1_02232018
mmrestoreconfig: Command successfully completed
Note: Disable Quota (remove the -Q yes option from this command) when you run it later in the
process.
Some excerpts from the restore_out_file (powerleBillionBack_gpfs_tctbill1_02232018):
## *************************************************************
## Filesystem configuration file backup for file system: gpfs_tctbill1
## Date Created: Tue Mar 6 14:15:05 CST 2018
##
## The ’#’ character is the comment character. Any parameter
## modified herein should have any preceding ’#’ removed.
## **************************************************************
etc....
# %pool:
# pool=system
# blockSize=4194304
# usage=dataAndMetadata
# layoutMap=scatter
Chapter 39. Cloud services: Transparent cloud tiering and Cloud data sharing 685
# allowWriteAffinity=no
#
######### File system configuration #############
## The user can use the predefined options/option values
## when recreating the filesystem. The option values
## represent values from the backed up filesystem.
#
# mmcrfs FS_NAME NSD_DISKS -i 4096 -j scatter -k nfs4 -n 100 -B 4194304 -Q yes
--version 5.0.0.0 -L 33554432 -S relatime -T /ibm/gpfs_tctbill1 --inode-limit
407366656:307619840
#
# When preparing the file system for image restore, quota
# enforcement must be disabled at file system creation time.
# If this is not done, the image restore process will fail.
...
####### Disk Information #######
## Number of disks 15
## nsd11 991486976
## nsd12 991486976
etc....
## nsd76 1073741824
%nsd:
device=/dev/mapper/mpatht
nsd=nsd48
servers= nsdServer2,nsdServer1
usage=dataOnly
failureGroup=1
pool=system
%nsd:
device=/dev/mapper/mpathbz
nsd=nsd49
servers= nsdServer1,nsdServer2
usage=metadataOnly
failureGroup=1
pool=system
b. Modify the restore_out_file to match the configuration on the recovery site. Example portion of
modified nsd stanzas for restore_out_file is as follows:
%nsd:
device=/dev/mapper/mpatht
nsd=nsd48
servers= nsdServer2,nsdServer1
usage=dataOnly
failureGroup=1
pool=system
%nsd:
device=/dev/mapper/mpathbz
nsd=nsd49
servers= nsdServer1,nsdServer2
usage=metadataOnly
failureGroup=1
pool=system
3. Create recovery-site NSDs if necessary.
a. Use the newly modified restore_out_file (powerleBillionRestore_gpfs_tctbill1_02232018_nsd in
this example) to create NSDs on the recovery cluster. This command must be run from an NSD
server node (if NSD servers are in use):
[root@recovery-site-nsd-server-node ~]# mmcrnsd -F
/temp/powerleBillionRestore_gpfs_tctbill1_02232018_nsd
mmcrnsd: Processing disk mapper/mpathq
etc...
mmcrnsd: Processing disk mapper/mpathcb
mmcrnsd: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
b. Repeat the mmcrnsd command appropriately for each file system that you want to recover.
4. Create recovery-site file systems if necessary.
a. Use the same modified restore_out_file (powerleBillionRestore_gpfs_tctbill1_02232018_nsd in this
example) as input for the mmcrfs command, which will create the file system. The following
example is based on the command included in the <restore_out_file> (note the '-Q yes' option has
been removed). For example,
root@recovery-site-nsd-server-node ~]# mmcrfs gpfs_tctbill1 -F
/temp/powerleBillionRestore_gpfs_tctbill1_02232018_nsd
-i 4096 -j scatter -k nfs4 -n 100 -B 4194304 --version 5.0.0.0 -L 33554432 -S
relatime -T /ibm/gpfs_tctbill1 --inode-limit 407366656:307619840
b. Repeat the mmcrfs command appropriately for each file system that you want to recover.
5. Cloud services configuration restore (download SOBAR backup from the cloud for the file system).
a. Securely transfer the Cloud services configuration file to the desired location by using scp or any
other commands.
b. From the appropriate Cloud services server node on the recovery site (a node from the recovery
Cloud services node class), download the SOBAR.tar by using the mcstore_sobar_download.sh
script. This script is there in the /opt/ibm/MCStore/scripts folder on your Cloud services node.
Note: Make sure your local_backup_dir is mounted and has sufficient space to accommodate the
SOBAR backup file. It is recommended to use a GPFS file system.
Usage: mcstore_sobar_download.sh <tct_config_backup_path> <sharing_container_pairset_name>
<node-class-name> <sobar_backup_tar_name> <local_backup_dir>
For example,
Chapter 39. Cloud services: Transparent cloud tiering and Cloud data sharing 687
[root@recovery-site-tct-node scripts]# ./mcstore_sobar_download.sh
/temp/TCT_backupConfig_20180306_123302.tar powerleSOBAR1 TCTNodeClassPowerLE
9277128909880390775_gpfs_tctbill1_03-14-18-17-03-01.tar /ibm/gpfs_tct_SOBAR1/
You are about to restore the TCT Configuration settings to the CCR.
Any new settings since the backup was made will be lost.
The TCT servers should be stopped prior to this operation.
etc...
mmcloudgateway: Sending the command to node recovery-site-tct-node.
Starting the Transparent Cloud Tiering service...
mmcloudgateway: The command completed on node recovery-site-tct-node.
etc...
Note: If your temporary restore staging space is on a Cloud services managed file system, then you
will have to delete and recreate this Cloud services managed file system at this point.
a. Restore policies for each file system using the mmrestoreconfig command.
Usage: mmrestoreconfig Device -i InputFile --image-restore
For example,
[root@recovery-site-tct-node ]# mmrestoreconfig gpfs_tctbill1 -i
/temp/powerleBillionBack_gpfs_tctbill1_02232018 --image-restore
--------------------------------------------------------
Configuration restore of gpfs_tctbill1 begins at Fri Mar 16 05:48:06 CDT 2018.
--------------------------------------------------------
mmrestoreconfig: Checking disk settings for gpfs_tctbill1:
mmrestoreconfig: Checking the number of storage pools defined for gpfs_tctbill1.
Note: If Cloud directory is pointing to another file system, make sure that the file system is
mounted correctly before you run the restore script providing the rebuildDB parameter value to
yes.
[root@recovery-site-tct-node scripts]# ./mcstore_sobar_restore.sh
/ibm/gpfs_tct_SOBAR1/9277128909880390775_gpfs_tctbill1_03-14-18-17-03-01.tar gpfs_tctbill1
TCTNodeClassPowerLE yes /ibm/gpfs_tct_SOBAR1 >> /root/status.txt
etc...
etc.....
Running file curation policy and converting co-resident files to Non resident.
This will take some time. Please wait until this completes..
etc...
Chapter 39. Cloud services: Transparent cloud tiering and Cloud data sharing 689
Completed file curation policy execution of converting co-resident files to
Non resident files.
running rebuild db for all the tiering containers for the given file system : gpfs_tctbill1
Running rebuild db for container pairset : powerlebill1spill2 and File System: gpfs_tctbill1
mmcloudgateway: Command completed.
Running rebuild db for container pairset : powerlebill1spill1 and File System: gpfs_tctbill1
mmcloudgateway: Command completed.
Running rebuild db for container pairset : powerlebill1 and File System: gpfs_tctbill1
etc...
b. Repeat the mcstore_sobar_restore.sh script appropriately for each file system that you want to
recover.
8. Enable Cloud services maintenance operations on the appropriate node class being restored on the
recovery site. For more information, see “Configuring the maintenance windows” on page 67.
9. Enable all Cloud services migration policies on the recovery site by using the --transparent-recalls
{ENABLE} option in the mmcloudgateway containerPairSet update command. For more information,
see “Binding your file system or fileset to the Cloud service by creating a container pair set” on page
63.
Command: mmbackupconfig
Command: mcstore_sobar_backup.sh
Recovery Site
Command: mmrestoreconfig
Command: mcstore_sobar_download.sh
Chapter 39. Cloud services: Transparent cloud tiering and Cloud data sharing 691
Table 63. Parameter description (continued)
Name Description
sharing_container_pairset_name The sharing container that you created as a prerequisite
to this procedure.
node-class-name The name of the Cloud services node class that is
restored.
sobar_backup_tar_name The name of the .tar file that was generated by the
mcstore_sobar_backup.sh script on the primary site, and
transferred to the sharing container.
local_backup_dir A directory of your choice that is large enough to accept
the 'SOBAR'.tar file. It is recommended to use a GPFS
file system.
Command: mcstore_sobar_restore.sh
Cloud data sharing works by combining the import and export functions that allow data to be moved
across disparate geographical locations and/or heterogeneous application platforms. Cloud data sharing
maintains a set of records of those moves called a manifest that enable applications to know what has
moved. An application at one site can generate data, export it to the cloud, and applications at other sites
can import and process that data. Applications can know what data has moved and is, therefore, now
available by looking at the manifest file. It is also a way to easily move data back and forth between local
and cloud storage systems. Cloud data sharing supports moving data to the cloud, and pulling data from
the cloud . Cloud data sharing must be configured with a local file system and a cloud account. Once
configured, data can be moved between the IBM Spectrum Scale file system and the cloud account.
Application considerations
Exporting applications need some mechanism to both notify other applications that new data is available
on the cloud and give those applications some way of understanding what objects were put to the cloud.
When data is imported, there are cases in which not all the data is needed and this unneeded data can be
identified by information in the file metadata. In these cases, it is recommended that as a first pass the
file headers are imported only with the "import-only-stub" option. The policy engine can then be used to
import only those files that are needed, thereby saving transfer time and cost. For now this import of
stub includes metadata only for data that was previously exported by IBM Spectrum Scale.
Note: For many cloud services, enabling indexed containers can impact performance, so it is possible that
cloud containers are not indexed. For these situations, a manifest is mandatory. But even with indexing
enabled, for large containers that contain many objects, a manifest can be useful.
Additionally, this manifest utility can be used by a non-Spectrum Scale application to build a manifest file
for other applications, including IBM Spectrum Scale, to use for importing purposes.
There is a manifest utility that can run separate from IBM Spectrum Scale (it is a Python script) that can
be used to look at the manifest. It provides a way to list and filter the manifest content, providing comma
separated value output.
To export files to a cloud storage tier, issue a command according to the following syntax:
mmcloudgateway files export
[--tag Tag ]
[--target-name TargetName ]
[--container Container | no-container ]
[--manifest-file MainifestFile ]
[--export-metadata [--fail-if-metadata-too-big ]]
[--strip-filesystem-root ]
File[ File ] }
The following example exports a local file named /dir1/dir2/file1 to the cloud and store it in a
container named "MyContainer". A manifest file will be created, and the object exported to the cloud will
have an entry in that manifest file tagged with "MRI_Images".
mmcloudgateway files export --container MyContainer --tag MRI_Images --export-metadata --manifest-file
/dir/ManifestFile /dir1/dir2/file1
To import files from a cloud storage tier, issue a command according to the following syntax:
mmcloudgateway files import
[--container Container | no-container ]
[--import-only-stub]
[--import-metadata ]
{ [--directory Directory] | [--directory-root DirectoryRoot] | [--target-name TargetName] }
{ PolicyFile -e | [--] File[ File ] }
The following example imports files from the cloud storage tier and creates a necessary local directory
structure.
mmcloudgateway files import --directory /localdir /dir1/dir2/file1
For more information on the usage of the import and export functions, see the mmcloudgateway man page.
Chapter 39. Cloud services: Transparent cloud tiering and Cloud data sharing 693
Although files are exported to the cloud from a IBM Spectrum Scale environment, the files can be
imported by a non-IBM Spectrum Scale application. While you export files to the cloud, a manifest file is
built. The manifest file includes a list of these exported files and the metadata associated with native
object storage.
When data is exported to the cloud, the manifest file is not automatically pushed to the cloud. You must
decide when and where to export the manifest file.
When to transfer: If you are using a policy to export data, a good time to export the manifest is
immediately after the policy has successfully executed your executive chain. Waiting too long can result
in manifest that is too big and that does not provide frequent enough guidance to applications looking
for notifications about new data on the cloud. Constantly pushing out new manifests can create other
problems where the applications have to deal with many small manifests, and having to understand
which they should use.
Where to transfer: Unlike transparent cloud tiering, cloud data sharing allows data to be transferred to
any container at any time. This freedom can be very useful, especially when setting up multiple tenants.
A centralized manifest is useful in a single tenant environment, but when there are multiple tenants with
different access privileges to different files it may be better to split up your manifest destinations
accordingly. Export all data targeted to a particular tenant and then send the manifest. Export data for the
next tenant, and so forth.
Typically, this file is not accessed directly but rather is accessed using the manifest utility.
where,
v TagID is an optional identifier the object is associated with.
v CloudContainerName is the name of the container the object was exported into.
v TimeStamp follows the format : "DD MON YYYY HH:MM:SS GMT".
v File/Object Name can contain commas, but not new line characters.
An example entry in a manifest utility stream output is as follows:
0, imagecontainer, 6 Sep 2016 20:31:45 GMT, images/a/cat.scan
You can use the mmcloudmanifest tool to parse the manifest file that is created by the mmcloudgateway
files export command or by any other means. By looking at the manifest files, an application can
download the desired files from the cloud.
The mmcloudmanifest tool is automatically installed on your cluster along with Transparent cloud tiering
rpms. However, you must install the following packages for the tool to work:
v Install Python version 2.7.5
v Install pip. For more information, see https://packaging.python.org/install_requirements_linux/
v Install apache-libcloud package by running the sudo pip install apache-libcloud command.
where,
v ManifestName: Specifies the name of the manifest object that is there on the cloud. For using a local
manifest file, specify the full path name to the manifest file.
v --properties-file PropertiesFile: Specifies the location of the properties file to be used when
retrieving the manifest file from the cloud. A template properties file is located at /opt/ibm/MCStore/
scripts/provider.properties. This file includes details such as the name of the cloud storage provider,
credentials, and URL.
v --persist-path PersistPath: Stores a local copy of the manifest file that is retrieved from the cloud in
the specified location.
v --manifest-container ManifestContainer: Name of the container in which the manifest is located.
v --tag-filter TagFilter: Lists only the entries whose Tag ID # matches the specified regular
expression (regex).
v --container-filter ContainerFilter: Lists only the entries whose container name matches the
specified regex.
v --from-time FromTime: Lists only the entries that occur starting at or after the specified time stamp.
The time stamp must be enclosed within quotations, and it must be in the 'DD MON YYYY HH:MM:SS
GMT' format. Example: '21 Aug 2016 06:23:59 GMT'
v --path-filter PathFilter: Lists only the entries whose path name matches the specified regex.
The following command exports four CSV files tagged with "us-weather", along with the manifest file,
"manifest.txt", to the cloud:
mmcloudgateway files export --container arn8781724981111500553 --manifest-file manifest.txt
--tag us-weather /gpfs/weather_data/MetData_Oct06-2016-Oct07-2016-ALL.csv
/gpfs/weather_data/MetData_Oct07-2016-Oct08-2016-ALL.csv
/gpfs/weather_data/MetData_Oct08-2016-Oct09-2016-ALL.csv
/gpfs/weather_data/MetData_Oct09-2016-Oct10-2016-ALL.csv
/gpfs/weather_data/MetData_Oct10-2016-Oct11-2016-ALL.csv
The following command exports four CSV files tagged with "uk-weather", along with the manifest file,
"manifest.text", to the cloud:
mmcloudgateway files export --container arn8781724981111500553 --manifest-file manifest.txt
--tag uk-weather /gpfs/weather_data/MetData_Oct06-2016-Oct07-2016-ALL.csv
/gpfs/weather_data/MetData_Oct07-2016-Oct08-2016-ALL.csv
/gpfs/weather_data/MetData_Oct08-2016-Oct09-2016-ALL.csv
/gpfs/weather_data/MetData_Oct09-2016-Oct10-2016-ALL.csv
/gpfs/weather_data/MetData_Oct10-2016-Oct11-2016-ALL.csv
The following command parses the manifest file and imports the files that are tagged with "us-weather"
to the local file system under the /gpfs directory:
mmcloudmanifest parse-manifest manifest.txt --tag-filter us-weather
| xargs mmcloudgateway files import --directory /gpfs --container arun8781724981111500553
Chapter 39. Cloud services: Transparent cloud tiering and Cloud data sharing 695
total 64
drwxr-xr-x. 2 root root 4096 Oct 5 07:09 automountdir
-rw-r--r--. 1 root root 7859 Oct 18 02:15 MetData_Oct06-2016-Oct07-2016-ALL.csv
-rw-r--r--. 1 root root 7859 Oct 18 02:15 MetData_Oct07-2016-Oct08-2016-ALL.csv
-rw-r--r--. 1 root root 14461 Oct 18 02:15 MetData_Oct08-2016-Oct09-2016-ALL.csv
-rw-r--r--. 1 root root 14382 Oct 18 02:15 MetData_Oct09-2016-Oct10-2016-ALL.csv
-rw-r--r--. 1 root root 14504 Oct 18 02:15 MetData_Oct10-2016-Oct11-2016-ALL.csv
drwxr-xr-x. 2 root root 4096 Oct 17 14:12 weather_data
Note: If encryption was enabled in the older release, then you must include these parameters while
creating a container pair set:
v KeyManagerName
v ActiveKey
5. Import the files by using the mmcloudgateway files import command.
To stop Cloud services on all Transparent cloud tiering nodes in a cluster, issue the following command:
mmcloudgateway service stop -N alltct
To stop the Cloud services on a specific node or a list of nodes, issue a command according to this
syntax:
mmcloudgateway service stop [-N alltct {Node[,Node...] | NodeFile | NodeClass}]
For example, to stop the service on the node, 10.11.12.13, issue this command:
mmcloudgateway service stop -N 10.11.12.13
Note: Before you stop Cloud services, ensure that no migration or recall operation is running on the
system where the service is stopped. You can find out the status of the migration or recall operation from
the GUI metrics.
To monitor the status of Cloud services, issue a command according to this syntax:
mmcloudgateway service status [-N {alltct | Node[,Node...] | NodeFile | NodeClass}]
[--cloud-storage-access-point-name CloudStorageAccessPointName] [-Y]
For example,
v To check the status of all available Transparent cloud tiering nodes in your cluster, issue this command:
mmcloudgateway service status -N alltct
Chapter 39. Cloud services: Transparent cloud tiering and Cloud data sharing 697
Note: ONLINE status here means container exists on the cloud, but it does not guarantee that the
migrations would work. This is because there could be storage errors on object storage, due to which
new object creation might fail. To verify container status for migrations, issue the mmcloudgateway
containerpairset test command.
For more information on all the available statuses and their description, see the Transparent Cloud Tiering
status description topic in IBM Spectrum Scale: Command and Programming Reference.
GUI navigation
CLI commands do not work on a cluster if all nodes in a node class are not running the same version of
the Cloud services. For example, you have three nodes (node1, node2, node3) in a node class
(TCTNodeClass1). Assume that the Cloud services version of node1 is 1.1.1, of node2 is 1.1.1, and of
node3 is 1.1.2. In this case, the CLI commands specific to 1.1.2 do not work in the TCTNodeClass1 node
class.
To check the service version of all Transparent cloud tiering nodes in a cluster, issue the following
command:
mmcloudgateway service version -N alltct
To check for service versions associated with Cloud services nodes, issue a command according to this
syntax:
mmcloudgateway service version [-N {Node[,Node...] | NodeFile | NodeClass}]
For example, to display the Cloud services version of the nodes, node1 and node2, issue the following
command:
mmcloudgateway service version -N node1,node2
To display the Cloud services version of each node in a node class, TCT, issue the following command:
mmcloudgateway service version -N TCT
Node Daemon node name TCT Type TCT Version Equivalent Product Version
------------------------------------------------------------------------------------------------
1 jupiter-vm1192.pok.stglabs.ibm.com Server 1.1.5 5.0.1.0
To display the client version of each node, issue the following command on the client node:
Node Daemon node name TCT Type TCT Version Equivalent Product Version
------------------------------------------------------------------------------------------------
4 jupiter-vm649.pok.stglabs.ibm.com Client 1.1.5 5.0.1.0
To verify the client version of a particular node, issue the following command:
mmcloudgateway service version -N jupiter-vm717
Node Daemon node name TCT Type TCT Version Equivalent Product Version
------------------------------------------------------------------------------------------------
3 jupiter-vm717.pok.stglabs.ibm.com Client 1.1.5 5.0.1.0
To check for all nodes in a node class, issue the following command:
mmcloudgateway service version -N tct
Node Daemon node name TCT Type TCT Version Equivalent Product Version
------------------------------------------------------------------------------------------------
2 jupiter-vm482.pok.stglabs.ibm.com Server 1.1.5 5.0.1.0
1 jupiter-vm597.pok.stglabs.ibm.com Server 1.1.5 5.0.1.0
To migrate all files (including files within the subfolders) in one go, issue this command:
find <gpfs-mountpoint-folder-or-subfolder> -type f –exec mmcloudgateway files migrate {} +
This command passes the entire list of files to a single migrate process in the background as
follows:
mmcloudgateway files migrate <file1> <file2> <sub-folder1/file1> <sub-folder2/file1> ......
Migrating Transparent cloud tiering specific configuration to cloud storage might lead to issues
While you move data to an external cloud storage tier, it is required not to migrate files within
the Transparent cloud tiering internal folder (.mcstore folder within the configured GPFS file
system) to cloud storage. It might lead to undesirable behavior for the Transparent cloud tiering
service. To address this issue, include the EXCLUDE directive in the migration policy.
Refer to the /opt/ibm/MCStore/samples folder to view sample policies that can be customized as
per your environment and applied on the file system that is managed by Transparent cloud
tiering.
Chapter 39. Cloud services: Transparent cloud tiering and Cloud data sharing 699
Running mmcloudgateway files delete on multiple files
Trying to remove multiple files in one go with the mmcloudgateway files delete
delete-local-file command fails with a NullPointerException. This happens while you clean
up the cloud metrics. Issue this command to remove the cloud objects:
find <gpfs-file-system> -type f -exec mmcloudgateway files delete {} \;
Range reads from the cloud object storage is not supported for transparent recall.
When a file is transparently recalled, the file is entirely recalled.
Policy-based migrations
Policy-based migrations should be started only from Transparent cloud tiering server nodes.
Client nodes should be used only for manual migration.
File names with carriage returns or non-UTF-8 characters
Transparent cloud tiering does not perform any migration or recall operation on files whose
names include carriage returns or non-UTF-8 characters.
File systems mounted with the nodev option
If a file system is mounted with the nodev option, then it cannot be mounted to a directory with
an existing folder with the same name as the file system. Transparent cloud tiering is not
supported in this situation.
Administrator cannot add a container pair set while managing a file system with 'automount' setting
turned on.
Make sure that automount setting is not turned on while a file system is in use with Transparent
cloud tiering.
Files created through NFS clients when migrated to the cloud storage tier
If caching is turned on on the NFS clients (with the --noac option) while mounting the file
system, files that are migrated to the cloud storage tier remain in the co-resident status, instead of
the non-resident status.
Transparent cloud tiering configured with proxy servers
IBM Security Key Lifecycle Manager does not work when Transparent cloud tiering is configured
with proxy servers.
Swift Dynamic Large Objects
Transparent cloud tiering supports Swift Dynamic Large Objects only.
No support for file systems earlier than 4.2.x
Cloud services support IBM Spectrum Scale file systems versions 4.2.x and later only.
Running reconciliation during heavy writes and reads on the file system
Reconciliation fails when it is run during heavy I/O operations on the file system.
For current limitations and restrictions, see IBM Spectrum Scale FAQs.
For more information, see the topic Interoperability of Transparent Cloud Tiering with other IBM Spectrum
Scale features in the IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
For more information about file audit logging, see Introduction to file audit logging in IBM Spectrum Scale:
Concepts, Planning, and Installation Guide.
To start consumers on a given set of nodes in IBM Spectrum Scale, issue a command similar to the
following example:
mmaudit all consumerStart -N Node1,[Node2,...]
For more information, see Consumers in file audit logging in the IBM Spectrum Scale: Concepts, Planning, and
Installation Guide, “Stopping consumers in file audit logging,” Monitoring the consumer status in IBM
Spectrum Scale: Problem Determination Guide, and the mmaudit command in IBM Spectrum Scale: Command
and Programming Reference.
To stop consumers on a given set of nodes in IBM Spectrum Scale, issue a command similar to the
following example:
mmaudit all consumerStop -N Node1,[Node2,...]
For more information, see Consumers in file audit logging in the IBM Spectrum Scale: Concepts, Planning, and
Installation Guide, “Starting consumers in file audit logging,” Monitoring the consumer status in IBM
Spectrum Scale: Problem Determination Guide, and mmaudit command in IBM Spectrum Scale: Command and
Programming Reference.
Displaying topics that are registered in the message queue for file
audit logging
The user can run the mmmsgqueue list --topics command to see a list of topics for file audit logging.
If there are file audit logging topics in the message queue, they will display an output like the following
example:
157_6372129557625143312_24_audit
In the example output, the 6372129557625143312 number is the GPFS cluster ID. This number can help
distinguish the topic from other topics if the user is working in an environment with multiple clusters.
The inclusion of audit at the end of the name means that the file is registered for file audit logging. For
more information, see The message queue in file audit logging in IBM Spectrum Scale: Concepts, Planning, and
Installation Guide.
For more information about adding nodes to an existing installation, see Adding nodes, NSDs, or file
systems to an existing installation in the IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
Adding nodes to the message queue after it has been enabled is not supported. To enable file audit
logging on a new spectrumscale cluster node, you must disable file audit logging on all file systems
where it is enabled, remove the message queue configuration, add the message queue configuration with
the new broker nodes specified, and then re-enable file audit logging on the file systems. The following
set of steps details the process:
1. Issue the following command to view the file systems that are enabled for file audit logging:
mmaudit all list
2. Issue the following command to disable file audit logging on all file systems that have it enabled:
mmaudit Device disable
3. Reissue the following command. Verify that you get the following message when file audit logging
has been disabled on all file systems:
# mmaudit all list
[I] File audit logging is disabled for all devices.
4. Issue the following command to disable the message queue:
mmmsgqueue config --remove
Note:
v You should run this command when the message queue configuration needs to be altered or
removed. For example, instead of simply disabling the message queue, you should run this
command if the set of message queue servers needs to be altered.
v This command will also remove the message queue node classes and configuration information.
5. Issue the following command and verify that you get the following message:
# mmmsgqueue status
[I] MsgQueue currently not enabled.
6. Ensure that all rpm, package, OS, and hardware requirements stated in the Requirements and
limitations for file audit logging topic in the IBM Spectrum Scale: Concepts, Planning, and Installation Guide
are met by the new node.
Note: Software requirements can be installed and verified using the installation toolkit. A node can be
added through the toolkit or by manually installing the required rpm and packages using the package
installation command based on the OS.
7. Enable file audit logging. For more information, see “Enabling file audit logging on a file system” on
page 87.
Note: Remember to add the new node when you enable the message queue.
8. Verify the nodes that are running the processes and their current states. For more information, see
Monitoring the message queue server and Zookeeper status in IBM Spectrum Scale: Problem Determination
Guide.
| For more information about file audit logging events, see File audit logging events' descriptions in the IBM
| Spectrum Scale: Concepts, Planning, and Installation Guide.
| For more information, see the mmaudit command in the IBM Spectrum Scale: Command and Programming
| Reference.
| Designating additional broker nodes allows more parallelism in the reception of events from the
| producers and writing of events by the consumers. Although a minimum of 3 message queue (broker)
| nodes are required to enable the message queue, it is recommended to designate a minimum of 5 nodes
| as message queue (broker) nodes. The following command is used to add additional broker nodes to the
| message queue:
| mmmsgqueue config --add-nodes -N { NodeName[,NodeName...] | NodeFile | NodeClass }}
| A comma-separated list of nodes, the path to a file with node names in it, or an existing node class can
| be used to represent the additional nodes in the message queue.
| Specifying additional broker nodes for the message queue is an intensive process, so it should be
| attempted when the message queue is being least utilized (off-hours). Because this command can take
| some time to run, it is beneficial to examine the steps involved when adding additional broker nodes:
| 1. Verify that none of the nodes specified in the list of additional broker nodes are already broker nodes.
| 2. Verify that all of the required message queue packages exist on the prospective additional broker
| nodes.
| 3. Remove any existing Kafka logs from the prospective additional broker nodes. This will ensure that
| there is not a conflict in broker ID or cached information when the broker starts on the node.
| 4. Ensure that the broker daemons are started on the additional broker nodes.
| 5. For all file audit logging enabled file systems, additional partitions are added to the corresponding
| topics to account for the new broker nodes. This action increases parallelism and performance.
| 6. For all file audit logging enabled file systems, the existing partitions corresponding to the message
| queue topics are redistributed among all broker nodes (both old and new).
| 7. For all file audit logging enabled file systems, new consumer processes are started on the additional
| broker nodes.
| For more information, see Watch folder API in the IBM Spectrum Scale: Command and Programming
| Reference.
| To help you get started, IBM has provided a sample program that demonstrates how the API can be used
| to set up a watch. The sample program is located in the /usr/lpp/mmfs/samples/util/ directory.
| 1. Build the sample program tswf.C by running make tswf.
| 2. You can run tswf to see the usage. For instance, to watch a folder for all events and to redirect those
| events to a file, run:
| tswf /gpfs/fs0/watch -o /tmp/log.file
| 3. To watch for one or more specific events, run:
| tswf /gpfs/fs0/watch -e IN_CLOSE_WRITE -e IN_OPEN
| 4. To watch an inode space for all events, run:
| tswf /gpfs/fs0/watch_in -w i
| 5. To watch a fileset for the first 100 file or folder create events, run:
| tswf /gpfs/fs0/watch_fs -w f -n 100
Current disk drive systems are optimized for large streaming writes, but many workloads such as VMs
and databases consist of many small write requests, which do not perform well with disk drive systems.
To improve the performance of small writes, storage controllers buffer write requests in non-volatile
memory before writing them to storage. This works well for some workloads, but the amount of NVRAM
is typically quite small and can therefore not scale to large workloads.
The goal of HAWC is to improve the efficiency of small write requests by absorbing them in any
nonvolatile fast storage device such as SSDs, Flash-backed DIMMs, or Flash DIMMs. Once the dirty data
is hardened, GPFS can immediately respond to the application write request, greatly reducing write
latency. GPFS can then flush the dirty data to the backend storage in the background.
By first buffering write requests, HAWC allows small writes to be gathered into larger chunks in the page
pool before they are written back to storage. This has the potential to improve performance as well by
increasing the average amount of data that GPFS writes back to disk at a time.
Further, when GPFS writes a data range smaller than a full block size to a block for the first time, the
block must first be fully initialized. Without HAWC, GPFS does this by writing zeroes to the block at the
time of the first write request. This increases the write latency since a small write request was converted
into a large write request (for example, a 4K write request turns into a 1MB write request). With HAWC,
this initialization can be delayed until after GPFS responds to the write request, or simply avoided
altogether if the application subsequently writes the entire block.
To buffer the dirty data, HAWC hardens write data in the GPFS recovery log. This means that with
HAWC, the recovery log must be stored on a fast storage device because if the storage device on which
the recovery log resides is the same as the data device, HAWC will decrease performance by writing data
twice to the same device. By hardening data in the recovery log, all incoming requests are transformed
into sequential operations to the log. In addition, it is important to note that applications never read data
from the recovery log, since all data that is hardened in the recovery log is always kept in the page pool.
The dirty data in the log is only accessed during file system recovery due to improper shutdown of one
or more mounted instances of a GPFS file system.
The maximum size of an individual write that can be placed in HAWC is currently limited to 64KB. This
limit has been set for several reasons, including the following:
v The benefit of writing data to fast storage decreases as the request size increases.
v Fast storage is typically limited to a much smaller capacity than disk subsystems.
v Each GPFS recovery log is currently limited to 1GB. Every file system and client pair has a unique
recovery log. This means that for each file system, the size of HAWC scales linearly with every
additional GPFS client. For example, with 2 file systems and 10 clients, there would be 20 recovery logs
used by HAWC to harden data.
Note that combining the use of HAWC with LROC allows GPFS to leverage fast storage on application
reads and writes.
With HAWC, storing the recovery log in fast storage has the added benefit that workloads that
experience bursts of small and synchronous write requests (no matter if they are random or sequential)
will also be hardened in the fast storage. Well-known applications that exhibit this type of write behavior
include VMs, databases, and log generation.
Since the characteristics of fast storage vary greatly, users should evaluate their application workload
with HAWC in their storage configuration to ensure a benefit is achieved. In general, however, speedups
should be seen in any environment that either currently lacks fast storage or has very limited (and
non-scalable) amounts of fast storage.
Enabling HAWC
To enable HAWC, set the write cache threshold for the file system to a value that is a multiple of 4 KB
and in the range 4 KB - 64 KB. The following example shows how to set the threshold for an existing file
system:
mmchfs gpfsA --write-cache-threshold 32K
The following example shows how to specify the threshold when you create a new file system:
mmcrfs /dev/gpfsB -F ./deiskdef2.txt -B1M --write-cache-threshold 32K -T /gpfs/gpfsB
After HAWC is enabled, all synchronous write requests less than or equal to the write cache threshold
are put into the recovery log. The file system sends a response to the application after it puts the write
request in the log. If the size of the synchronous write request is greater than the threshold, the data is
written directly to the primary storage system in the usual way.
Proper storage for the recovery log is important to improve the performance of small synchronous writes
and to ensure that written data survives node or disk failures. Two methods are available:
Method 1: Centralized fast storage
In this method, the recovery log is stored on a centralized fast storage device such as a storage
controller with SSDs, a flash system, or an IBM Elastic Storage™ Server (ESS) with SSDs.
You can use this configuration on any storage that contains the system pool or the system.log
pool. The faster that the metadata pool is compared to the data storage, the more using HAWC
can help.
Method 2: Distributed fast storage in client nodes
In this method, the recovery log is stored on IBM Spectrum Scale client nodes on local fast
storage devices, such as SSDs, NVRAM, or other flash devices.
The local device NSDs must be in the system.log storage pool. The system.log storage pool
contains only the recovery logs.
It is a good idea to enable at least two replicas of the system.log pool. Local storage in an IBM
Spectrum Scale node is not highly available, because a node failure makes the storage device
inaccessible.
Use the mmchfs command with the --log-replicas parameter to specify a replication factor for the
system.log pool. This parameter, with the system.log capability, is intended to place log files in a
separate pool with replication different from other metadata in the system pool.
You can change log replication dynamically by running the mmchfs command followed by the
mmrestripefs command. However, you can enable log replication only if the file system was
created with a number of maximum metadata replicas of 2 or 3. (See the -M option of the
mmcrfs command .)
Administrative tasks
where Seconds is the number of seconds to wait. This setting helps to avoid doing a restripe
after a temporary outage such as a node rebooting. The default time is 300 seconds.
Adding HAWC to an existing file system
Follow these steps:
1. If the metadata pool is not on a fast storage device, migrate the pool to a fast storage device.
For more information, see “Managing storage pools” on page 366.
2. Increase the size of the recovery log to at least 128 MB. Enter the following command:
mmchfs Device -L LogFileSize
where LogFileSize is the size of the recovery log. For more information, see the topic mmchfs
command in the IBM Spectrum Scale: Command and Programming Reference guide.
3. Enable HAWC by setting the write cache threshold, as described earlier in this topic.
Local solid-state disks (SSDs) provide an economical way to create very large caches. The SSD cache
serves as an extension to the local buffer pool. As user data or metadata is evicted from the buffer pool in
memory, it can be stored in the local cache. A subsequent access will retrieve the data from the local
cache, rather than from the home location. The data stored in the local cache, like data stored in memory,
remains consistent. If a conflicting access occurs, the data is invalidated from all caches. In a like manner,
if a node is restarted, all data stored in the cache is discarded.
In theory, any data or metadata can be stored in the local SSD cache, but the cache works best for small
random reads where latency is a primary concern. Since the local cache typically offers less bandwidth
than the backend storage, it might be unsuitable for large sequential reads. The configuration options
provide controls over what is stored in the cache. The default settings are targeted at small random I/O.
The local read-only cache (LROC) function is disabled by default. To enable it, the administrator must
define an NSD for an LROC device. The LROC device is expected to be a solid-state disk (SSD) accessible
via SCSI. The device is defined as a standard NSD by mmcrnsd, but the DiskUsage is set to localCache.
The NSD must have a primary server and is not allowed to have other servers. The primary server must
be the node where the physical LROC device is installed. The device is not exported to other nodes in the
cluster. The storage pool and failure group defined for the NSD are ignored and should be set to null.
The mmcrnsd command writes a unique NSD volume ID onto the device.
The minimum size of a local read-only cache device is 4 GB. The local read-only cache requires memory
equal to 1% of the capacity of the LROC device.
Once the LROC device is defined, the daemon code at the primary server node is automatically told to
do device discovery. The daemon detects that localCache is defined for its use and determines the
mapping to the local device. The daemon then informs the local read-only cache code to begin using the
device for caching. Currently, there is a limit of four localCache devices per node. Note that the daemon
code does not need to be restarted to begin using the cache.
The LROC device can be deleted by using the mmdelnsd command. Both mmcrnsd and mmdelnsd can
be issued while the daemon is running with file systems mounted and online. The call to delete the NSD
first informs the daemon that the device is being deleted, which removes it from the list of active LROC
devices. Any data cached on the device is immediately lost, but data cached on other local LROC devices
is unaffected. Once the mmdelnsd command completes, the underlying SSD can be physically removed
from the node.
The NSD name for the LROC device cannot be used in any other GPFS commands, such as mmcrfs,
mmadddisk, mmrpldisk, mmchdisk or mmchnsd. The device is shown by mmlsnsd as a localCache.
Note: To avoid a security exposure, by default IBM Spectrum Scale does not allow file data from
encrypted files, which is held in memory as cleartext, to be copied into an LROC device. However, you
can set IBM Spectrum Scale to allow cleartext from encrypted files to be copied into an LROC device
with the following command:
mmchconfig lrocEnableStoringClearText=yes
You might choose this option if you have configured your system to remove the security exposure.
If you use a federation type configuration, and the affected node is a collector,
Note: You cannot specify the --daemon-interface option for a quorum node if CCR is enabled.
Temporarily change the node to a nonquorum node. Then issue the mmchnode command with the
--daemon-interface option against the nonquorum node. Finally, change the node back into a quorum
node.
4. If the IP addresses over which the subnet attribute is defined are changed, you must update your
configuration by issuing the mmchconfig command with the subnets attribute.
5. Start GPFS on all nodes with mmstartup -a.
6. Remove the unneeded old host names and IP addresses.
If only a subset of the nodes are affected, it may be easier to make the changes using these steps:
1. Before any of the host names or IP addresses are changed:
v Use the mmshutdown command to stop GPFS on all affected nodes.
v If the host names or IP addresses of the primary or secondary GPFS cluster configuration server
nodes must change, use the mmchcluster command to specify another node to serve as the primary
or secondary GPFS cluster configuration server.
v If the host names or IP addresses of an NSD server node must change, temporarily remove the
node from being a server with the mmchnsd command. Then, after the node has been added back
to the cluster, use the mmchnsd command to change the NSDs to their original configuration. Use
the mmlsnsd command to obtain the NSD server node names.
v If the affected node is a CES node, CES must be disabled from the node using the mmchnode -N
<node> --ces-disable command.
Note: You can use the mmchnode command if you need to re-enable CES on the node.
Note: When you change your cluster node names and IP addresses, ensure that the performance
monitoring configuration file is also changed accordingly. To change the performance monitoring
configuration file, follow these steps:
1. Save the current configuration file in a temporary file using the following command:
mmperfmon config show --config-file <tmp_file_name>
2. Change all occurrences of the old node name or IP address to the new one, using an editor.
Important: If no changes are needed, then you do not need to run the update.
3. Update the performance monitoring configuration information using the following command:
mmperfmon config update --config-file <tmp_file_name>
If you use a federation type configuration, and the affected node is a collector, you need to change the
names and IP addresses of the peers in the /opt/IBM/zimon/ZIMonCollector.cfg file on all collector nodes
| as well. Starting with IBM Spectrum Scale version 5.0.2, the peers section will be managed automatically.
Important: If the long admin node names (FQDN) of any call home group members were changed, the
customer must delete the affected call home groups, and then create new ones if needed. Run the
mmcallhome group list command to verify whether nodes that are members of a call home group are
deleted, or their long admin node names (including domain) are changed. In such cases, the mmcallhome
group list command displays ------ instead of the names of such nodes. For more information, see the
mmcallhome command section in the IBM Spectrum Scale: Command and Programming Reference.
If you are performing the procedure during a scheduled maintenance window and GPFS can be shut
down on all of the nodes in the cluster, issue the command:
mmchconfig enableIPv6=yes
After the command finishes successfully, you can start adding new nodes with IPv6 addresses.
If it is not possible to shut down GPFS on all of the nodes at the same time, issue the command:
mmchconfig enableIPv6=prepare
The next step is to restart GPFS on each of the nodes so that they can pick up the new configuration
setting. This can be done one node at a time when it is convenient. To verify that a particular node has
been refreshed, issue:
mmdiag --config | grep enableIPv6
This command will only succeed when all GPFS daemons have been refreshed. Once this operation
succeeds, you can start adding new nodes with IPv6 addresses.
To convert an existing node from an IPv4 to an IPv6 interface, use one of the procedures described in
“Changing IP addresses and host names” on page 713.
Before a lock on an object can be granted to a thread on a particular node, the lock manager on that node
must obtain a token from the token server. The total number of token manager nodes depends on the
number of manager nodes defined in the cluster.
When a file system is first mounted, the file system manager is the only token server for the file system.
Once the number of external mounts exceeds one, the file system manager appoints all the other manager
nodes defined in the cluster to share the token server load. Once the token state has been distributed, it
remains distributed until all external mounts have gone away. The only nodes that are eligible to become
token manager nodes are those designated as manager nodes.
The number of files for which tokens can be retained on a manager node is restricted by the values of the
maxFilesToCache and maxStatCache configuration parameters of the mmchconfig command. Distributing
the tokens across multiple token manager nodes allows more tokens to be managed or retained
concurrently, improving performance in situations where many lockable objects are accessed concurrently.
or
mmexportfs all -o ExportDataFile
6. Ensure that the file system disks from the old GPFS cluster are properly connected, and are online
and available to be accessed from appropriate nodes of the new GPFS cluster.
or
mmimportfs all -i ExportDataFile
You can specify a different port number using the mmchconfig command:
mmchconfig tscTcpPort=PortNumber
When the main GPFS daemon (mmfsd) is not running on the primary and backup configuration server
nodes, a separate service (mmsdrserv) is used to provide access to the configuration data to the rest of
the nodes in the cluster. The port number used for this purpose is controlled with the mmsdrservPort
parameter. By default, mmsdrserv uses the same port number as the one assigned to the main GPFS
daemon. If you change the daemon port number, you must specify the same port number for mmsdrserv
using the following command:
mmchconfig mmsdrservPort=PortNumber
Do not change the mmsdrserv port number to a number different from that of the daemon port number.
Certain commands (mmadddisk, mmchmgr, and so on) require an additional socket to be created for the
duration of the command. The port numbers assigned to these temporary sockets are controlled with the
tscCmdPortRange configuration parameter. If an explicit range is not specified, the port number is
dynamically assigned by the operating system from the range of ephemeral port numbers. If you want to
restrict the range of ports used by IBM Spectrum Scale commands, use the mmchconfig command:
mmchconfig tscCmdPortRange=LowNumber-HighNumber
In a remote cluster setup, if IBM Spectrum Scale on the remote cluster is configured to use a port number
other than the default, you have to specify the port number to be used with the mmremotecluster
command:
mmremotecluster update ClusterName -n tcpPort=PortNumber,Node,Node...
For related information, see the topic “Firewall recommendations for internal communication among
nodes” on page 720.
Note: Ports configured for the IBM Spectrum Scale remote shell command (such as ssh) or the remote
file copy command (such as scp) and the ICMP echo command (network ping) also must be unblocked in
the firewall for IBM Spectrum Scale to function properly.
The installation toolkit uses the following ports during IBM Spectrum Scale installation.
Table 67. Recommended port numbers that can be used for installation
Components involved in
Port Number Protocol Service Name communication
8889 TCP Chef Intra-cluster and installer
server
10080 TCP Repository Intra-cluster and installer
server
123 UDP NTP Intra-cluster or external
depending on the NTP
server location
The port that is used during the installation (8889) can be blocked when the installation is over. You can
get the list of protocol IPs by using the mmlscluster --ces command. Use the mmlscluster command to
get the list of all internal IPs.
Chef is the underlying technology used by the installation toolkit. During installation, a Chef server is
started on the installation server, and repositories are created to store information about the various IBM
Spectrum Scale components. Each node being installed by the installation toolkit must be able to establish
a connection to the repository and the Chef server itself. Typically, the installation toolkit is run on the
NTP is not necessary but time sync among nodes is highly recommended and it is required for protocol
nodes.
v The SSH port 22 is used for command execution and general node-to-node configuration as well as
administrative access.
v The primary GPFS daemons (mmfsd and mmsdrserv), by default, listen on port 1191. This port is
essential for basic cluster operation. The port can be changed manually by setting the mmsdrservPort
configuration variable with the mmchconfig mmsdrservPort=PortNumber command.
v The ephemeral port range of the underlying operating system is used when IBM Spectrum Scale
creates additional sockets to exchange data among nodes. This occurs while executing certain
commands and this process is dynamic based on the point in time needs of the command as well as
other concurrent cluster activities. You can define an ephemeral port range manually by setting the
tscCmdPortRange configuration variable with the mmchconfig tscCmdPortRange=LowNumber-HighNumber
command.
If the installation toolkit is used, the ephemeral port range is automatically set to 60000-61000. Firewall
ports must be opened according to the defined ephemeral port range. If commands such as mmlsmgr and
mmcrfs hang, it indicates that the ephemeral port range is improperly configured.
For related information, see the topic “IBM Spectrum Scale port usage” on page 716.
The following are the recommendations for securing internal communications among IBM Spectrum Scale
nodes:
v Allow connection only to the GPFS cluster node IPs (internal IPs and protocol node IPs) on port 1191.
Block all other external connections on this port. Use the mmlscluster --ces command to get the list of
protocol node IP and use the mmlscluster command to get the list of IPs of internal nodes.
v Allow all external communications request that are coming from the admin or management network
and IBM Spectrum Scale internal IPs on port 22.
v Certain commands such as mmadddisk, mmchmgr, and so on require an extra socket to be created for the
duration of the command. The port numbers that are assigned to these temporary sockets are
controlled with the tscCmdPortRange configuration parameter. If an explicit range is not specified, the
port number is dynamically assigned by the operating system from the range of ephemeral port
Note: NFSV3 uses the dynamic ports for NLM, MNT, and STATD services. When an NFSV4 server is
used with the firewall, these services must be configured with static ports.
The following table lists the ports configured for object access.
Table 72. Port numbers for object access
Components that are
involved in
Port Number Protocol Service Name communication
8080 TCP Object Storage Proxy Object clients and IBM
Spectrum Scale protocol
node
6200 TCP Object Storage (local Local host
account server)
6201 TCP Object Storage (local Local host
container server)
6202 TCP Object Storage (local object Local host
server)
Shell access by non-root users must be restricted on IBM Spectrum Scale protocol nodes where the object
services are running to prevent unauthorized access to object data.
Note: The reason for these restrictions is that because there is no authentication of requests made on
ports 6200, 6201, 6202, and 6203, it is critical to ensure that these ports are protected from access by
unauthorized clients.
You can configure either an external or internal Keystone server to manage the authentication requests.
Keystone uses the following ports:
Table 73. Port numbers for object authentication
Components that are
involved in
Port Number Protocol Service Name communication
5000 TCP Keystone Public Authentication clients and
object clients
35357 TCP Keystone Internal/Admin Authentication and object
clients and Keystone
administrator
These ports are applicable only if keystone is hosted internally on the IBM Spectrum Scale system. The
following port usage is applicable:
v Allow all external communication requests that are coming from the admin or management network
and IBM Spectrum Scale internal IPs on port 35357.
v Allow all external communication requests that are coming from clients to IBM Spectrum Scale for
object storage on port 5000. Block all other external connections on this port.
The Postgres database server for object protocol is configured to use the following port:
Table 74. Port numbers for Postgres database for object protocol
Components that are
involved in
Port Number Protocol Service Name communication
5431 TCP and UDP postgresql-obj Inter-protocol nodes
Note: The Postgres instance used by the object protocol uses port 5431. This is different from the default
port to avoid conflict with other Postgres instances that might be on the system including the instance for
IBM Spectrum Scale GUI.
Consolidated list of recommended ports that are used for installation, internal
communication, and protocol access
The following table provides a consolidated list of recommended ports and firewall rules.
Table 75. Consolidated list of recommended ports for different functions
Internal ports that
Dependent External ports that are used for Nodes for
network service are used for file and inter-cluster which the rules
Function names object access communication UDP / TCP are applicable
Installer Chef N/A 8889 (chef) TCP GPFS server,
NSD server,
10080 (repo) protocol nodes
GPFS (internal GPFS N/A 1191 (GPFS) TCP and UDP GPFS server,
communication) NSD server,
60000-61000 for TCP only for 22 protocol nodes
tscCmdPortRange
22 for SSH
SMB gpfs-smb.service 445 4379 (CTDB) TCP Protocol nodes
only
gpfs-ctdb.service
rpc.statd
NFS gpfs.ganesha.nfsd 2049 (NFS_PORT - N/A TCP and UDP Protocol nodes
required only by only
rpcbind NFSV3)
rpc.statd 111 (RPC - required
only by NFSV3)
32765
(STATD_PORT)
32767 (MNT_PORT -
required only by
NFSV3)
32768
(RQUOTA_PORT -
required by both
NFSV3 and NFSV4)
32769 (NLM_PORT -
required only by
NFSV3)
The following table lists the ports that need to be used to secure GUI.
Table 76. Firewall recommendations for GUI
Port Number Functions Protocol
9080 Installation GUI HTTP
9443 Installation GUI HTTPS
80 Management GUI HTTP
All nodes of the IBM Spectrum Scale cluster must be able to communicate with the GUI nodes through
the ports 80 and 443. If multiple GUI nodes are available in a cluster, the communication among those
GUI nodes is carried out through the port 443.
Both the management GUI and IBM Spectrum Scale management API share the same ports. That is, 80
and 443. However, for APIs, the ports 443 and 80 are internally forwarded to 47443 and 47080
respectively. This is done automatically by an iptables rule that is added during the startup of the GUI
and is removed when the GUI is being stopped. The update mechanism for iptables can be disabled by
setting the variable UPDATE_IPTABLES to false, which is stored at: /etc/sysconfig/gpfsgui.
Note: The IBM Spectrum Scale GUI ports are not configurable. The GUI cannot coexist with a web server
that uses the same ports.
The management GUI uses ZIMon to collect performance data. ZIMon collectors are normally deployed
with the management GUI and sometimes on other systems in a federated configuration. Each ZIMon
collector uses three ports, which can be configured in ZIMonCollector.cfg. The default ports are 4739,
9085, and 9084. The GUI is sending its queries on the ports 9084 and 9085 and these ports are accessible
only from the localhost. For more information on the ports used by the performance monitoring tools, see
“Firewall recommendations for Performance Monitoring tool” on page 727.
The following table lists the ports for communicating with SKLM. The SKLM ports apply for both IBM
Spectrum Scale file encryption and Transparent Cloud Tiering (TCT).
Table 77. Firewall recommendations for SKLM
Port Number Protocol Service Components
v SKLM 2.6: 9080 TCP SKLM REST admin interface mmsklmconfig utility for configuring
Spectrum Scale
v SKLM 2.7: 443
v SKLM 3.0: 443
v 5696 TCP SKLM Key Management Spectrum Scale daemon for retrieving
Interoperability Protocol (KMIP) encryption keys, mmsklmconfig utility for
interface configuring Spectrum Scale
The following table lists the ports for communicating with DSM. The DSM ports are used by IBM
Spectrum Scale file encryption.
Table 78. Firewall recommendations for SKLM
Port Number Protocol Service Components
8445 TCP DSM administration web GUI The mmsklmconfig command for
retrieving a server certificate chain
5696 TCP DSM Key Management The Spectrum Scale daemon for retrieving
Interoperability Protocol (KMIP) encryption keys
interface
The following table lists the ports for communicating with a REST API server.
Table 79. Firewall recommendations for REST API
Port Number Interface type Protocol
8191 REST API HTTPS
Users of the REST API must communicate with a server through port 8191.
Important:
v The 4739 port needs to be open when a collector is installed.
v The 9085 port needs to be open when there are two or more collectors.
For AFM data transfers, either NFS or NSD is used as the transport protocol.
v For port requirements of NFS, see “Firewall recommendations for protocol access” on page 721.
v For port requirements of NSD, see “Firewall recommendations for internal communication among
nodes” on page 720.
In both cases, all nodes in the cluster requiring access to another cluster's file system must be able to
open a TCP/IP connection to every node in the other cluster. For information on the basic GPFS cluster
operation port requirements, see “Firewall recommendations for internal communication among nodes”
on page 720.
Note: Each cluster participating in a remote mount might reside on the same internal network or on a
separate network from the host cluster. From a firewall standpoint, this means that the host cluster might
need ports to be opened to a number of external networks, depending on how many separate clusters are
accessing the host.
Both functions require an open path for communication between the nodes designated for use with
mmbackup or HSM policies and the external IBM Spectrum Protect server. The port requirement listed in
the following table can be viewed in the dsm.sys configuration file also.
Table 81. Required port number for mmbackup and HSM connectivity to IBM Spectrum Protect server
Components involved in
Port number Protocol Service name communication
1500 TCP TSM IBM Spectrum Protect
Backup-Archive client
communication with server
This requires that each IBM Spectrum Archive node can communicate with the rest of the cluster using
the ports required for basic GPFS cluster operations. For more information, see “Firewall
recommendations for internal communication among nodes” on page 720. In addition to this, IBM
Spectrum Archive communicates with RPC. For RPC related port requirements, see “Firewall
recommendations for protocol access” on page 721.
IBM Spectrum Archive can connect to tape drives using a SAN or a direct connection.
IBM Spectrum Scale™ IBM Support server is accessible through port 443. The following table lists the port
and the Host/IP for communicating with the IBM Support server.
Table 83. Recommended port numbers that can be used for call home.
Port Number Function Protocol Host/ IP
443 Call home HTTPS esupport.ibm.com:
v 129.42.56.189
v 129.42.60.189
v 129.42.54.189
SLES 12
1. Open the YaST tool by issuing the following command: yast
2. Click Security and Users > Firewall.
3. Select the Allowed Services tab and click Advanced....
4. Enter the desired port range in the from-port-start:to-port-end format and specify the protocol
(TCP or UDP). For example, enter 60000:60010 to open ports 60000 to 60010.
5. Click OK to close the Advanced dialog box.
6. Click Next and review the summary of your changes.
7. Click Finish to apply your changes.
The iptables utility is available on most Linux distributions to set firewall rules and policies. These
Linux distributions include Red Hat Enterprise Linux 6.8, Red Hat Enterprise Linux 7.x, CentOS 7.x, SLES
12, Ubuntu, and Debian. Before using these commands, check which firewall zones might be enabled by
For information on how CES IPs are aliased to network adapters, see “CES IP aliasing to network
adapters on protocol nodes” on page 28.
Special administrative exports are required for each file system the user needs to detect and log file
system activity for. This also necessitates the use of Active Directory as the file authentication method for
the IBM Spectrum Scale cluster.
Ensure that the prerequisites are met before configuring the IBM Spectrum Scale node and installing the
Varonis agent. Once the agent is successfully installed, install the DatAdvantage software and the
management console. Tune Varonis DatAdvantage to work with IBM Spectrum Scale to audit the file
system activity. For more information on Varonis DatAdvantage, and how it can be incorporated with
IBM Spectrum Scale, see Varonis Audit Logging.
Note:
To view the Varonis Audit Logging information in the IBM developer works website, non-IBM users have
to register and create a username and password to access the information. IBM users must scroll to the
bottom of the page and use the link to login with their IBM intranet username and password.
IBM supports higher versions of the browsers if the vendors do not remove or disable function that the
product relies upon. For browser levels higher than the versions that are certified with the product,
customer support accepts usage-related and defect-related service requests. If the support center cannot
re-create the issue, support might request the client to re-create the problem on a certified browser
version. Defects are not accepted for cosmetic differences between browsers or browser versions that do
not affect the functional behavior of the product. If a problem is identified in the product, defects are
accepted. If a problem is identified with the browser, IBM might investigate potential solutions or
work-arounds that the client can implement until a permanent solution becomes available.
For limitations of the installation GUI, see Installing IBM Spectrum Scale by using the graphical user interface
(GUI) in IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
Accessibility features
The following list includes the major accessibility features in IBM Spectrum Scale:
v Keyboard-only operation
v Interfaces that are commonly used by screen readers
v Keys that are discernible by touch but do not activate just by touching them
v Industry-standard devices for ports and connectors
v The attachment of alternative input and output devices
IBM Knowledge Center, and its related publications, are accessibility-enabled. The accessibility features
are described in IBM Knowledge Center (www.ibm.com/support/knowledgecenter).
Keyboard navigation
This product uses standard Microsoft Windows navigation keys.
IBM may not offer the products, services, or features discussed in this document in other countries.
Consult your local IBM representative for information on the products and services currently available in
your area. Any reference to an IBM product, program, or service is not intended to state or imply that
only that IBM product, program, or service may be used. Any functionally equivalent product, program,
or service that does not infringe any IBM intellectual property right may be used instead. However, it is
the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or
service.
IBM may have patents or pending patent applications covering subject matter described in this
document. The furnishing of this document does not grant you any license to these patents. You can send
license inquiries, in writing, to:
IBM Director of Licensing IBM Corporation North Castle Drive, MD-NC119 Armonk, NY 10504-1785 US
For license inquiries regarding double-byte character set (DBCS) information, contact the IBM Intellectual
Property Department in your country or send inquiries, in writing, to:
Intellectual Property Licensing Legal and Intellectual Property Law IBM Japan Ltd. 19-21, Nihonbashi-
Hakozakicho, Chuo-ku Tokyo 103-8510, Japan
This information could include technical inaccuracies or typographical errors. Changes are periodically
made to the information herein; these changes will be incorporated in new editions of the publication.
IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this
publication at any time without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in
any manner serve as an endorsement of those websites. The materials at those websites are not part of
the materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you provide in any way it believes appropriate without
incurring any obligation to you.
Licensees of this program who wish to have information about it for the purpose of enabling: (i) the
exchange of information between independently created programs and other programs (including this
one) and (ii) the mutual use of the information which has been exchanged, should contact:
IBM Director of Licensing IBM Corporation North Castle Drive, MD-NC119 Armonk, NY 10504-1785 US
Such information may be available, subject to appropriate terms and conditions, including in some cases,
payment of a fee.
The performance data discussed herein is presented as derived under specific operating conditions.
Actual results may vary.
Information concerning non-IBM products was obtained from the suppliers of those products, their
published announcements or other publicly available sources. IBM has not tested those products and
cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM
products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of
those products.
Statements regarding IBM's future direction or intent are subject to change or withdrawal without notice,
and represent goals and objectives only.
All IBM prices shown are IBM's suggested retail prices, are current and are subject to change without
notice. Dealer prices may vary.
This information is for planning purposes only. The information herein is subject to change before the
products described become available.
This information contains examples of data and reports used in daily business operations. To illustrate
them as completely as possible, the examples include the names of individuals, companies, brands, and
products. All of these names are fictitious and any similarity to actual people or business enterprises is
entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs
in any form without payment to IBM, for the purposes of developing, using, marketing or distributing
application programs conforming to the application programming interface for the operating platform for
which the sample programs are written. These examples have not been thoroughly tested under all
conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these
programs. The sample programs are provided "AS IS", without warranty of any kind. IBM shall not be
liable for any damages arising out of your use of the sample programs.
Each copy or any portion of these sample programs or any derivative work must include
a copyright notice as follows:
If you are viewing this information softcopy, the photographs and color illustrations may not appear.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at
Copyright and trademark information at www.ibm.com/legal/copytrade.shtml.
Intel is a trademark of Intel Corporation or its subsidiaries in the United States and other countries.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or
both.
UNIX is a registered trademark of the Open Group in the United States and other countries.
Applicability
These terms and conditions are in addition to any terms of use for the IBM website.
Personal use
You may reproduce these publications for your personal, noncommercial use provided that all
proprietary notices are preserved. You may not distribute, display or make derivative work of these
publications, or any portion thereof, without the express consent of IBM.
Commercial use
You may reproduce, distribute and display these publications solely within your enterprise provided that
all proprietary notices are preserved. You may not make derivative works of these publications, or
reproduce, distribute or display these publications or any portion thereof outside your enterprise, without
the express consent of IBM.
Rights
Except as expressly granted in this permission, no other permissions, licenses or rights are granted, either
express or implied, to the publications or any information, data, software or other intellectual property
contained therein.
IBM reserves the right to withdraw the permissions granted herein whenever, in its discretion, the use of
the publications is detrimental to its interest or, as determined by IBM, the above instructions are not
being properly followed.
You may not download, export or re-export this information except in full compliance with all applicable
laws and regulations, including all United States export laws and regulations.
Notices 741
collect personally identifiable information. If this Software Offering uses cookies to collect personally
identifiable information, specific information about this offering’s use of cookies is set forth below.
This Software Offering does not use cookies or other technologies to collect personally identifiable
information.
If the configurations deployed for this Software Offering provide you as customer the ability to collect
personally identifiable information from end users via cookies and other technologies, you should seek
your own legal advice about any laws applicable to such data collection, including any requirements for
notice and consent.
For more information about the use of various technologies, including cookies, for these purposes, See
IBM’s Privacy Policy at http://www.ibm.com/privacy and IBM’s Online Privacy Statement at
http://www.ibm.com/privacy/details the section entitled “Cookies, Web Beacons and Other
Technologies” and the “IBM Software Products and Software-as-a-Service Privacy Statement” at
http://www.ibm.com/software/info/product-privacy.
Glossary 745
mirroring of data protects it against data available, as long as there is access to a
loss within the database or within the majority of the quorum disks.
recovery log.
non-quorum node
Microsoft Management Console (MMC) A node in a cluster that is not counted for
the purposes of quorum determination.
A Windows tool that can be used to do
basic configuration tasks on an SMB
P
server. These tasks include administrative
tasks such as listing or closing the policy A list of file-placement, service-class, and
connected users and open files, and encryption rules that define characteristics
creating and manipulating SMB shares. and placement of files. Several policies
can be defined within the configuration,
multi-tailed
but only one policy set is active at one
A disk connected to multiple nodes.
time.
N policy rule
A programming statement within a policy
namespace
that defines a specific action to be
Space reserved by a file system to contain
performed.
the names of its objects.
pool A group of resources with similar
Network File System (NFS)
characteristics and attributes.
A protocol, developed by Sun
Microsystems, Incorporated, that allows portability
any host in a network to gain access to The ability of a programming language to
another host or netgroup and their file compile successfully on different
directories. operating systems without requiring
changes to the source code.
Network Shared Disk (NSD)
A component for cluster-wide disk primary GPFS cluster configuration server
naming and access. In a GPFS cluster, the node chosen to
maintain the GPFS cluster configuration
NSD volume ID
data.
A unique 16 digit hex number that is
used to identify and access all NSDs. private IP address
A IP address used to communicate on a
node An individual operating-system image
private network.
within a cluster. Depending on the way in
which the computer system is partitioned, public IP address
it may contain one or more nodes. A IP address used to communicate on a
public network.
node descriptor
A definition that indicates how GPFS uses
Q
a node. Possible functions include:
manager node, client node, quorum node, quorum node
and nonquorum node. A node in the cluster that is counted to
determine whether a quorum exists.
node number
A number that is generated and quota The amount of disk space and number of
maintained by GPFS as the cluster is inodes assigned as upper limits for a
created, and as nodes are added to or specified user, group of users, or fileset.
deleted from the cluster.
quota management
node quorum The allocation of disk blocks to the other
The minimum number of nodes that must nodes writing to the file system, and
be running in order for the daemon to comparison of the allocated space to
start. quota limits at regular intervals.
node quorum with tiebreaker disks
A form of quorum that allows GPFS to
run with as little as one quorum node
Glossary 747
subblock V
The smallest unit of data accessible in an
VFS See virtual file system.
I/O operation, equal to one thirty-second
of a data block. virtual file system (VFS)
A remote file system that has been
system storage pool
mounted so that it is accessible to the
A storage pool containing file system
local user.
control structures, reserved files,
directories, symbolic links, special devices, virtual node (vnode)
as well as the metadata associated with The structure that contains information
regular files, including indirect blocks and about a file system object in a virtual file
extended attributes The system storage system (VFS).
pool can also contain user data.
T
token management
A system for controlling file access in
which each application performing a read
or write operation is granted some form
of access to a specific block of file data.
Token management provides data
consistency and controls conflicts. Token
management has two components: the
token management server, and the token
management function.
token management function
A component of token management that
requests tokens from the token
management server. The token
management function is located on each
cluster node.
token management server
A component of token management that
controls tokens relating to the operation
of the file system. The token management
server is located at the file system
manager node.
transparent cloud tiering (TCT)
A separately installable add-on feature of
IBM Spectrum Scale that provides a
native cloud storage tier. It allows data
center administrators to free up
on-premise storage capacity, by moving
out cooler data to the cloud storage,
thereby reducing capital and operational
expenditures. .
twin-tailed
A disk connected to two nodes.
U
user storage pool
A storage pool containing the blocks of
data that make up user files.
Index 751
Clustered Configuration Repository (CCR) commands (continued)
failback with temporary loss 449 mmcrnsd 165, 166
Clustered NFS (CNFS) environment mmcrsnapshot 147, 421
administration 465 mmdefragfs 138, 139
configuration 465 mmdelacl 314, 315, 319
failover 463 mmdeldisk 167, 367
implementing mmdelfileset 415
Linux 463 mmdelfs 119
load balancing 464 mmdelnode 3, 714
locking 464 mmdf 137, 167, 361, 364, 368
monitoring 463 mmeditacl 314, 315, 317, 318
network setup 464 mmedquota 293, 297
setup 464 mmexportfs 715
Clustered NFS environment (CNFS) mmfsck 120, 122, 167, 361
migration to CES 477 mmgetacl 312, 313, 317, 318, 319
clustered NFS subsystem mmgetlocation 554
using 341 mmimportfs 715
clusters mmlinkfileset 411, 414, 415, 416
accessing file systems 347 mmlsattr 127, 368, 410, 416
configuring 520 mmlscluster 1, 355
exporting data 538 mmlsconfig 361
exporting output data 538 mmlsdisk 122, 170, 361, 715
ingesting data 538 mmlsfileset 412, 413, 415, 416
CNFS 341 mmlsfs 125, 170, 302, 303, 340, 361, 367
CNFS (Cluster NFS environment mmlsmgr 21
migration to CES 477 mmlsmount 120, 361
CNFS (Clustered NFS) environment mmlsnsd 165, 713
administration 465 mmlspolicy 401
configuration 465 mmlsqos 134
failover 463 mmlsquota 301
implementing mmmount 115, 116, 174, 355
Linux 463 mmputacl 312, 313, 315, 318, 319
load balancing 464 mmquotaoff 302, 303
locking 464 mmquotaon 302
monitoring 463 mmremotecluster 347, 355, 359, 361
network setup 464 mmremotefs 174, 347, 355, 361
setup 464 mmrepquota 303
co-resident state migration 670 mmrestorefs 413
collecting mmrestripefile 367
performance metrics 72 mmrestripefs 136, 137, 167, 170, 367, 369, 519
commands completion time 22
active 112 mmrpldisk 169, 367
chmod 312 mmsetquota 293
localityCopy 557 mmshutdown 23, 467, 713
mmaddcallback 370, 400 mmsnapdir 413, 421
mmadddisk 166, 366, 367 mmstartup 23, 467, 713
mmaddnode 2, 714 mmumount 118
mmapplypolicy 140, 367, 370, 371, 391, 392, 394, 399, 400, mmunlinkfileset 411, 415, 416
402, 405, 407 mmuserauth 183, 190
mmauth 16, 347, 350, 354, 356, 358, 359 Commands
mmbackup 140, 142, 144, 145, 413 mmnetverify 113
mmbackupconfig 140 commandsmmapplypolicy 424
mmces 468, 470, 471 commandsmmdelshapshot 425
mmchattr 127, 128, 136, 367 common GPFS command principles 109, 110
mmchcluster 4, 713 communications I/O
mmchconfig 6, 15, 16, 20, 38, 339, 340, 347, 354, 356, 361, Linux nodes 42
467, 714, 715 COMPRESS 374
mmchdisk 122, 136, 170, 171, 367 CONCAT 388
mmcheckquota 120, 293, 300, 303 configuration
mmchfileset 416 ID mapping 223
mmchfs 126, 174, 293, 302, 303, 340 configuration and tuning settings
mmchnsd 173, 713 access patterns 40
mmchpolicy 370, 400, 401, 402, 566 aggregate network interfaces 40
mmchqos 134 AIX settings 43
mmcrcluster 1, 347 use with Oracle 43
mmcrfileset 414 clock synchronization 37
mmcrfs 115, 293, 302, 315, 340, 366 communications I/O 42
Index 753
DAYSINMONTH 390 disks (continued)
DAYSINYEAR 390 I/O settings 42
deactivating quota limit checking 303 managing 165
declustered array stanza 110 maximum number 165
default ACL 312 replacing 168, 169
default quotas 294 status 170
DELETE rule 371, 377 storage pool assignment
deleting changing 367
a GPFS cluster 3 strict replication 170
file systems 119 displaying
nodes from a GPFS cluster 3 access control lists 313
snapshots 425 disk fragmentation 138
deleting a cloud storage account 57 disk states 170
deleting a CSAP disks 165
configuring Transparent cloud tiering 60 quotas 301
deleting cloud objects 674 distributedTokenServer 715
deleting files dynamic validation of descriptors on disk 122
manually 674
deletion of data, secure 565
deploy WORM on IBM Spectrum Scale 77
deploying
E
EINVAL 371
WORM solutions 83
enabling
deploying WORM solutions 83
Persistent Reserve 174
Deploying WORM solutions
QOS 270
IBM Spectrum Scale 79
enabling cluster for IPv6
set up private key and private certificate 79
IPv6, enabling a cluster for 714
designating
encrypted file
transparent cloud tiering node 55
remote access 596
Direct I/O caching policy 127
encryption 565
direct I/O considerations 345
encryption-enabled environment 577
DIRECTORIES_PLUS 375
local read-only cache (LROC) 651
directory server 186
regular setup 606
DirInherit 316
simplified setup 577
disabling
simplified tasks 600
Persistent Reserve 174
standards compliance 650
QOS 270
Encryption
disaster recovery
external pools 652
establishing 441
Encryption (IBM Spectrum Archive Enterprise Edition 652
GPFS replication 441
Encryption (IBM Spectrum Protect 652
configuring 443
Encryption (IBM Spectrum Scale Transparent Cloud
IBM ESS FlashCopy 459
Tiering 652
IBM TotalStorage
encryption and backup/restore 651
active-active cluster 451
encryption and FIPS compliance 650
active-passive cluster 455
encryption and NIST compliance 650
overview 439
encryption and secure deletion 648
disaster recovery procedure (Cloud services 680
encryption and snapshots 651
disk availability 170
encryption hints 647
disk descriptor 366
ENCRYPTION IS policy rule 566
disk descriptors 166, 168
encryption key cache purging 649
disk discovery 174
encryption keys 565
disk failure
encryption policies 566
stopping auto recovery 547
encryption policies, rewrapping 570
disk failures 546
encryption policy example 569
disk replacement 560
encryption policy rules 566
disk state 546
encryption setup requirements 571
changing 171
encryption-enabled environment 577
displaying 170
regular setup 606
disk status 170
simplified setup 577
diskFailure Event 562
ENCRYPTION, SET (policy rule) 566
disks
Error messages
adding 166
certificates 636
availability 170
establishing disaster recovery
deleting 167
cluster 440
displaying information 165
establishing quotas 297
ENOSPC 170
EtherChannel 40
failure 136
example of encryption policy 569
fragmentation 138
Index 755
file systems (continued) filesets (continued)
backing up 140, 144 unlinking 416
changing mount point on protocol nodes 118 with mmbackup 413
checking 120 FIPS compliance and encryption 650
controlled by GPFS 344 FIPS1402mode 650
disk fragmentation 138 Firewall
exporting 338, 715 REST API 726
exporting using NFS 337 firewall considerations 729
file audit logging 91 firewall ports
format changes 161 examples of opening 730
format number 161 firewall recommendations
fragmentation call home 729
querying 139 file audit logging 729
GPFS control 344 installation 719
granting access 356 NTP 719
Linux export 338 protocols access 721
mounting on multiple nodes 116 SKLM 726
NFS export 339 Vormetric DSM 726
NFS V4 export 340 FlashCopy consistency groups 459
physical connection 347 FOR FILESET 375
reducing fragmentation 139 FPO 515, 519, 520
remote access 356 configuration changes 522
remote mount 354 configuring 520
remote mount concepts 347 distributing data 519
repairing 120 pool file placement and AFM 520
restriping 136 restrictions 564
revoking access 356 upgrading 539
security keys 359, 360 FPO cluster 544
snapshots 421 FPO clusters
space, querying 137 administering 541
unmounting on multiple nodes 118 monitoring 541
user access 349 monitoring, administering 541
virtual connection 347 FROM POOL 375
File-based configuration for performance monitoring tool 74 functions
FileInherit 316 CHAR 387
files CONCAT 388
/.rhosts 38 CURRENT_DATE 389
/etc/group 349 CURRENT_TIMESTAMP 389
/etc/passwd 349 DAY 389
/var/mmfs/ssl/id_rsa.pub 355, 359 DAYOFWEEK 389
ill-placed 369 DAYOFYEAR 389
pre-migrating 407 DAYS 390
storage pool assignment 367 DAYSINMONTH 390
files, stanza 110 DAYSINYEAR 390
fileset snapshots HEX 388
subset restore 153 HOUR 390
filesets INT 389
attributes 416 INTEGER 389
changing 416 LENGTH 388
backing up 140, 142 LOWER 388
block allocation 412 MINUTE 390
cautions 415 MOD 389
creating 414 MONTH 390
deleting 415 QUARTER 390
dependent 410 REGEX 388
in global snapshots 412 REGEXREPLACE 388
independent 410 SECOND 390
inode allocation 412 SUBSTR 388
linking 415, 416 SUBSTRING 389
managing 414 TIMESTAMP 390
names 414 UPPER 389
namespace attachment 411 VARCHAR 389
overview 410 WEEK 390
quotas 293, 412 YEAR 390
root 410, 415, 416
snapshots 413
storage pool usage 412
Index 757
IBM Spectrum Protect (continued) IBM Spectrum Scale (continued)
for IBM Spectrum Scale 149, 150 change GPFS disk states 171
scheduling backups 148 change GPFS parameters 171
IBM Spectrum Protect backup planning 148 change NSD configuration 173
dsm.opt options 150 change Object configuration values 248
dsm.sys options 149 change quota limit checking 303
IBM Spectrum Protect backup scheduler 148 changing NFS export configuration 238
IBM Spectrum Protect Backup-Archive client changing the GPFS cluster configuration data 4
cautions check quota 300
unlinking 416 cluster configuration information 1
IBM Spectrum Protect interface 144 configuration 528
IBM Spectrum Protect Manager backup planning 151 configuring 37, 38, 39, 40, 41, 42, 43, 55
IBM Spectrum Scale 87, 88, 91, 107, 108, 109, 110, 118, 226, configuring and tuning 57, 667
234, 247, 261, 284, 286, 288, 481, 482, 485, 486, 487, 489, 491, configuring CES 55
492, 493, 495, 498, 501, 502, 504, 507, 508, 509, 510, 511, 512, configuring cluster 1
513, 515, 519, 520, 521, 522, 523, 525, 538, 539, 541, 564, 565, continuous replication of data 441
566, 571, 647, 648, 650, 651, 655, 657, 659, 665, 701, 702, 703, create export on container 273
705, 707, 708, 709, 713, 714, 715, 716, 718, 727, 731, 732, 735 create NFS export 237
access control lists 313, 317, 327 create SMB share ACLs 232
administration 315 Create SMB shares 234
applying 318 Creating SMB share 231
change 314, 318 creating storage policy 272
delete 314, 319 data ingestion 279
display 318 data integrity 440
exceptions 319 deactivating quota limit checking 303
limitations 319 delete disk 167
setting 312, 318 deleting node 3
syntax 315 disaster recovery 451
translation 317 disaster recovery solutions 459
access control lists (ACL) disconnect active connections to SMB 236
best practices 323 disk availability 170
inheritance 322 disk status 170
permissions 324 Disks in a GPFS cluster 165
ACL administration 311 display GPFS disk states 170
activating quota limit checking 302 enable file-access object capability 269
active connections to SMB export 236 Enable object access 273
Active File Management 93, 97, 103, 104 establish and change quotas 297
tuning NFS client 103 establishing
tuning NFS server 103 disaster recovery 441
Active File Management DR 99, 101, 103 establishing disaster recovery 440
Add disks 166 export file systems 338
adding CES groups in a cluster 34 file audit logging 87
adding node 2 file system quota report
administering unified file and object access 269 create 303
example scenario 274 file systems 338
AFM 93, 97, 103, 104 AIX export 339
configuration parameters 93 firewall ports 728, 730
NFS client 103 firewall recommendations 719, 720, 721, 725, 726, 729
parallel I/O configuration parameters 97 GPFS access control lists (ACLs)
tuning NFS server on home/secondary cluster 104 manage 311
tuning_gatewaynode 103 GPFS cache usage 340
AFM DR 99, 101, 103 GPFS quota management
configuration parameters 99 disable 293
AFM-based DR enable 293
parallel I/O configuration parameters 101 identity management modes for unified file and object
apply ILM policy access 262
transparent cloud tiering 667 in-place analytics 276
associate containers 273 limitations 353
authentication transparent cloud tiering 699
integrating with AD server 183 limitations of unified file and object access 277
authorizing protocol users 319 Linux export 338
Backup 284 list NFS export 239
CES configuration list quota information 301
update 502 list SMB shares 233
CES IPs 28 local read-only cache 711
CES packages Manage default quotas 294
deploying 32 manage disk 165
Index 759
IBM TotalStorage (continued) internal communication among nodes
configuration 452 firewall 720
failover 454 firewall recommendations 720
active-passive cluster 455 firewall recommendations for 720
configuration 456 internal storage pools
failover 458 files
ibmobjectizer service 267 purging 407
ID mapping managing 364
shared authentication 266 metadata 364
identity management mode for unified file and object access overview 364
local_mode 261 system 364
identity management modes unified file and object access system.log 364
unified_mode 262 user 364
identity management on Windows 481 IP addresses
ill-placed CES (Cluster Export Services) 468
files 369 private 356
ILM public 356
snapshot 409 remote access 356
ILM (information lifecycle management 408 IP addresses CNFS (Clustered Network environment) 464
ILM (information lifecycle management) IP addresses, changing 713
overview 363
image backup 147
image restore 147
immutability
J
job
directories 417
mmapplypolicy 391
effects 417
phase 1 391
files 417
phase 2 392
integrated archive manager (IAM) modes 417
phase 3 394
import and export files 696
Jumbo Frames 42
import files after upgrade to 5.0.2 696
junction 411
importing and exporting files 692
importing files exported by using old version of Cloud
services 696
in-place analytics 276 K
inband DR 487 Kerberos based NFS access configuration
information lifecycle management (ILM) prerequisites 205
overview 363 key cache purging, encryption 649
information lifecycle management(ILM) 408 key clients
inheritance flags 322 configurations 593
inheritance of ACLs 315 key manager for Cloud services 62
DirInherit 316 keys, encryption 565
FileInherit 316 Keystone
Inherited 316 expired tokens 222
InheritOnly 316 Keystone tokens
Inherited 316 deleting 222
InheritOnly 316
installation
firewall 719
firewall recommendations 719
L
large file systems
installing GPFS, using mksysb 713
mmapplypolicy
installing Windows IDMU 482
performance 402
INT 389
LDAP
INTEGER 389
bind user requirements 185
integrate transparent cloud tiering metrics
LDAP server 184
with performance monitoring tool
LDAP user information 187
using GPFS-based configuration 73
LDAP-based authentication for file access 199
integrated archive manager (IAM) modes
LDAP with Kerberos 201
immutability 417
LDAP with TLS 200
integrating
LDAP with TLS and Kerberos 202
transparent cloud tiering
LDAP without TLS and Kerberos 203
performance monitoring tool 72
LDAP-based authentication for object access 215
integrating transparent cloud tiering
LDAP with TLS 216
performance monitoring tool 74
LDAP without TLS 215
internal communication
LENGTH 388
port numbers 720
LIMIT 376
recommended port numbers 720
limitations
CES NFS Linux 345
Index 761
MMC (continued) mmrestripefs 136, 137, 167, 170, 367, 369, 519
SMB export view number of file locks 237 completion time 22
mmces 468, 470, 471 mmrpldisk 367
mmchattr 127, 128, 136, 367 mmsetquota 293
mmchcluster 4, 713 mmshutdown 23, 467, 713
mmchconfig 6, 16, 339, 340, 347, 354, 356, 361, 467, 714, 715 mmsmb
mmchconfig command 38 list SMB shares 233
mmchdisk 136, 170, 171, 367 mmsnapdir 413
mmcheckquota 293, 300, 303 mmstartup 23, 467, 713
mmchfileset 416 mmumount 118
mmchfs 126, 174, 293, 302, 303, 340 mmunlinkfileset 411, 415, 416
mmchnsd 713 mmuserauth 183, 190, 198
mmchpolicy 370, 400, 401, 402, 566 MOD 389
mmcloudgateway destroy modifying file system attributes 126
command 674 MONTH 390
mmcloudmanifest tool 694 mount file system
mmcrcluster 1, 347 GUI 117
mmcrfileset 414 mount problem
mmcrfs 115, 293, 302, 315, 340, 366 remote cluster 361
mmcrnsd 166 mounting
mmcrsnapshot 147 file systems 115
mmdefragfs 138, 139 mounting a file system
mmdelacl 314, 315, 319 an NFS exported file system 337
mmdeldisk 167, 367 mtime 343
mmdelfileset 415 multi-cluster environments
mmdelfs 119 upgrade 353
mmdelnode 3, 714 multi-cluster protocol environment
mmdelsnapshot 425 IBM Spectrum Scale 352
mmdf 137, 167, 361, 364, 368 multi-region object deployment
mmeditacl 314, 315, 317, 318 adding region 256
mmedquota 293, 297 administering 257
mmexportfs 715 exporting configuration data 257
mmfsck 167, 361 importing configuration data 257
mmgetacl 312, 313, 317, 318, 319 removing region 257
mmgetlocation 554 multicluster
mmimportfs 715 file system access 347
mmlinkfileset 411, 414, 415, 416 Multiple nodes failure without SGPanic 550
mmlsattr 127, 368, 410, 416 multiple versions of data
mmlscluster 1, 355 IBM Spectrum Scale 441
mmlsconfig 361 Multiprotocol export considerations
mmlsdisk 170, 361, 715 NFS export 242
mmlsfileset 412, 413, 415, 416 SMB export 242
mmlsfs 125, 170, 302, 303, 340, 361, 367
mmlsmgr 21
mmlsmount 120, 361
mmlsnsd 165, 713
N
Network configuration
mmlspolicy 401
CES (Cluster Export Services) 468
mmlsquota 301
Network File System (NFS)
mmmount 115, 116, 174, 355
cache usage 340
mmmsgqueue
exporting a GPFS file system 337
disable 87
interoperability with GPFS 337
file audit logging 87
synchronous writes 340
mmnetverify
unmounting a file system 340
command 113
Network Information Server 204
mmnfs export add command 237
network interfaces 40
mmnfs export change 239
Network Shared Disks
mmnfs export load 239
create 521
mmobj command
Network Shared Disks (NSDs)
changing Object configuration values 248
changing configuration attributes 173
mmputacl 312, 313, 315, 318, 319
network switch failure 551
mmquotaoff 302, 303
NFS
mmquotaon 302
quotas 296
mmremotecluster 347, 355, 359, 361
NFS automount 341
mmremotefs 174, 347, 355, 361
NFS export
mmrepquota 303
create NFS export 237
mmrestorefs 413
list NFS export 239
mmrestripefile 367
make NFS export change 239
Index 763
options (continued) policy rules (continued)
nomtime 116 built-in functions (continued)
norelatime 116 miscellaneous 383
nosyncnfs 116 numerical 383
relatime 116 string 383
syncnfs 117 DELETE 407
useNSDserver 117 examples 394
Oracle EXTERNAL POOL 406
GPFS use with, tuning 43 m4 macro processor 398
orphaned files 120 overview 372
outband DR 489 SQL expressions in 379
syntax 372
terms 374
P tips 394
types 372
packages
policy rules, encryption 566
gpfs.gskit 354
pools, external
pagepool parameter
requirements 369
usage 38
port usage, IBM Spectrum Scale 716
parents
PR 174
file clones 431
pre-migrating files
performance
cloud storage tier 670
access patterns 40
pre-migration
aggregate network interfaces 40
overview 407
disk I/O settings 42
prefetchThreads parameter
mmapplypolicy 402
tuning
monitoring using mmpmon 37
on Linux nodes 41
setting maximum amount of GPFS I/O 43
use with Oracle 43
Performance Monitoring tool
preparations for SOBAR 681
firewall 727
prerequisite
performance tuning
Kerberos-based SMB access 191
object services 179
prerequisites
performing
Kerberos based NFS access 205
rolling upgrade 542
LDAP server 184
Persistent Reserve
prerequisites and preparations for SOBAR (Cloud
disabling 174
services) 681
enabling 174
primary site 681
physical disk stanza 110
recovery site 682
physically broken disks 548
primary site preparations for SOBAR 681
policies
principles
assigning files 400
common to GPFS commands 109, 110
changing active 401
Procedure for SOBAR (Cloud services) 680
creating 400
protection of data 565
default 402
protocol data
default storage pool 400
security 657
deleting 402
protocol data security
error checking 370
protocol, data security 659
external storage pools
protocol node
managing 400
remove from cluster 34
file management 370
protocol nodes
file placement 370, 412
firewall 718
installing 400
firewall recommendations 718
listing 401
protocol over remote cluster mounts 350
overview 370
protocol support on remotely mounted file system
policy rules 372
limitations 353
SET POOL 400
protocols
validating 402
administration tasks 25, 33
policies, encryption 566
removal tasks 25
policies, rewrapping 570
protocols access
policy example, encryption 569
CES IP aliasing 28
policy files
port usage 721
file clones 432
protocols data exports 231
policy rule, ENCRYPTION IS 566
protocols disaster recovery
policy rule, SET ENCRYPTION 566
authentication configuration 511
policy rules 128
authentication configuration failback 512
built-in functions
authentication configuration failover 512
date and time 383
authentication configuration restore 512
extended attribute 383
Index 765
REGEXREPLACE 388 restoring the locality for files
remapping with WADFG 559
group id 349 without WADFG 558
user id 349 restrictions and tuning for highly-available write cache 708
remote access restriping a file system 136
AUTHONLY 358 revoking
displaying information 361 old certificate 83
encrypted file 596 rewrapping policies 570
file system 354 RKM back ends 571
IP addresses 356 RKM server setup 571
managing 356 rolling upgrades 542
mount problem 361 root authority 38
restrictions 361 root fileset 415, 416
security keys 359 root squash 350
security levels 358 root squashing 349
updating 361 root-level processes
remote cluster sudo wrappers 20
displaying information 361 rotating
mount problem 361 client key 83
restrictions 361 rule (policy), ENCRYPTION IS 566
updating 361 rule (policy), SET ENCRYPTION 566
remote key management server setup 571 RULE clause
remote mounting ACTION 374
firewall considerations 728 COMPRESS 374
Renew certificates 636, 643 DIRECTORIES_PLUS 375
repairing EXCLUDE 375
file system 120 FOR FILESET 375
replace broken disks 561 FROM POOL 375
replace more than one active disks 561 GROUP POOL 376
replacing disks 168, 169 LIMIT 376
REPLICATE 376 REPLICATE 376
REPLICATE clause 376 SET POOL 377
replication SHOW 377
changing 127 THRESHOLD 377
querying 127 TO POOL 378
storage pools 369 WEIGHT 378
system storage pool 364 WHEN 378
replication of data WHERE 379
IBM Spectrum Scale 441 rules, encryption policy 566
requirements
administering GPFS 107
external pools 369
for IBM Spectrum Scale 142
S
S3 ACLs
requirements (setup), encryption 571
managing 250
REST API
S3 API
firewall 726
enabling 248
firewall recommendations for 726
samba attributes 187
restarting
Scale out back and restore procedure (Cloud services) 680
IBM Spectrum Scale cluster 544
Scale Out Backup and Restore 147
restore
scale out backup and restore (SOBAR)backup 433
file system
scale out backup and restore (SOBAR)overview 433
SOBAR 435
scale out backup and restore (SOBAR)restore 435
storage pools 408
scheduling the maintenance activities 67
restore option
script
transparent cloud tiering 676
external pool 406
Restore option
SECOND 390
Cloud services configuration 678
secure deletion of data 565
restore procedure (using SOBAR) 684
secure deletion, encryption 648
restore/backup and encryption 651
secure protocol data 657
Restoring
security
Cloud services configuration 678
firewall recommendations 718
Restoring deleted files
security key
from cloud storage tier 676
changing 360
Restoring files
security keys
transparent cloud tiering 676
remote access 359
restoring from local snapshots
security levels
using the sample script 155
AUTHONLY 358
Index 767
stanza, declustered array 110 subroutines (continued)
stanza, NSD 110 gpfs_iopen() 147
stanza, physical disk 110 gpfs_iopen64() 147
stanza, recovery group 110 gpfs_iread() 147
stanza, virtual disk 110 gpfs_ireaddir() 147
starting gpfs_ireaddir64() 147
Cloud services 56 gpfs_next_inode() 147
transparent cloud tiering service 56 gpfs_next_inode64() 147
starting and stopping ibmobjectizer 269 gpfs_open_inodescan() 147
starting consumers gpfs_open_inodescan64() 147
file system 701 gpfs_quotactl() 294
starting GPFS 22, 23 SUBSTR 388
before starting 22 SUBSTRING 389
status sudo wrapper 17
disk 170 sudo wrapper scripts
steps for SOBAR in Cloud services 680 configuring on existing cluster 18
stopping configuring on new cluster 18
transparent cloud tiering service 696 sudo wrappers
stopping consumers root-level processes 20
file system 701 swap space 40
stopping GPFS 22, 23 swift workers
storage tuning 179
partitioning 363 synchronous mirroring
storage management GPFS replication 441
automating 363 using storage-base replication 451
tiered 363 syntax
storage policies 253, 254, 255 policy rules 372
storage policies for object system storage pool 366, 367, 371
administering 254 deleting 367
compression 254 highly reliable disks 364
encryption 255 replication 364
mapping to filesets 253 system.log pool
storage pools deleting 367
backup 408 system.log storage pool
creating 366 definition 365
deleting 367
disk assignment
changing 367
external
T
TCP window 42
working with 403
temporary snapshot
file assignment 367
backing up to the IBM Spectrum Protect server 143
files
tenants
listing fileset of 368
configurations 593
listing pool of 368
terms
listing 367
policy rules 374
listing disks in 368
THRESHOLD 377
managing 366
TIMESTAMP 390
names 366
tivoli directory server
overview 363
ACLs 186
rebalancing 369
Tivoli Storage Manager 369
replication 369
TO POOL 378
restore 408
transparent cloud tiering 667
subroutines
administering files 667
gpfs_fgetattrs() 408
automatically applying a policy 667
gpfs_fputattrs() 408
clean up files 674
gpfs_fputattrswithpathname() 408
database recovery 679
system storage pool 364
dmremove commands 674
system.log storage pool 365
limitations 699
user storage pools 365
listing files migrated to the cloud 676
storage replication
migrating files 670
general considerations 440
recall files 672
storage-base replication
ZIMon integration 72
synchronous mirroring 451
Transparent cloud tiering
subnet 356
cloud storage access points 60
subnets 714
configuration 60
subroutines
creating a key manager 62
gpfs_iclose() 147
creating container pairs 63
Index 769
Windows
auto-generated ID mappings 481
configuring ID mappings in IDMU 482
identity management 481
IDMU installation 482
installing IDMU 482
worker1Threads parameter
tuning
on Linux nodes 41
use with Oracle 43
WORM solution
deploying 77
WORM solutions
deploying 81
deployment 77
set up Transparent cloud tiering 78
write cache, highly available 707
write file permission 311
write once read many
solutions 77
Y
YEAR 390
Printed in USA
SC27-9288-01