ZFS Cheatsheet: This Is A Quick and Dirty Cheatsheet On Sun's ZFS
ZFS Cheatsheet: This Is A Quick and Dirty Cheatsheet On Sun's ZFS
ZFS cheatsheet
ZFS Cheatsheet
This is a quick and dirty cheatsheet on Sun's ZFS
## non‐standard distributed parity‐based software raid levels, one common problem called "write‐hole" is elimiated because raidz in
## zfs the data and stripe are written simultanously, basically is a power failure occurs in the middle of a write then you have the
## data plus the parity or you dont, also ZFS supports self‐healing if it cannot read a bad block it will reconstruct it using the
## parity, and repair or indicate that this block should not be used.
## You should keep the raidz array at a low power of two plus partity
raidz1 ‐ 3, 5, 9 disks
raidz2 ‐ 4, 6, 8, 10, 18 disks
Raidz1/2/3 raidz3 ‐ 5, 7, 11, 19 disks
## the more parity bits the longer it takes to resilver an array, standard mirroring does not have the problem of creating the parity
## so is quicker in resilvering
## raidz is more like raid3 than raid5 but does use parity to protect from disk failures
raidz/raidz1 ‐ minimum of 3 devices (one parity disk), you can suffer a one disk loss
raidz2 ‐ minimum of 4 devices (two parity disks), you can suffer a two disk loss
raidz3 ‐ minimum of 5 devices (three parity disks) , you can suffer a three disk loss
hard drives marked as "hot spare" for ZFS raid, by default hot spares are not used in a disk failure you must turn on the
spare
"autoreplace" feature.
Linux caching mechanism use what is known as least recently used (LRU) algorithms, basically first in first out (FIFO) blocks are
cache moved in and out of cache. Where ZFS cache is different it caches both least recently used block (LRU) requests and least frequent
used (LFU) block requests, the cache device uses level 2 adaptive read cache (L2ARC).
ZFS intent log (ZIL) ‐ a logging mechanism where all the data to be written is stored, then later flushed as a transactional
write, this is similar to a journal filesystem (ext3 or ext4).
log
Seperate intent log (SLOG) ‐ a seperate logging devive that caches the synchronous parts of the ZIL before flushing them to
the slower disk, it does not cache asynchronous data (asynchronous data is flushed directly to the disk). If the SLOG exists
the ZIL will be moved to it rather than residing on platter disk, everything in the SLOG will always be in system memory.
Basically the SLOG is the device and the ZIL is data on the device.
Storage Pools
zpool list
zpool list ‐o name,size,altroot
displaying # zdb can view the inner workings of ZFS (zdb has a number of options)
zdb <option> <pool>
Note: there are a number of properties that you can select, the default is: name, size, used, available, capacity, health, altroot
zpool status
http://www.datadisk.co.uk/html_docs/sun/sun_zfs_cs.htm 1/7
1/11/2017 Sun ZFS cheatsheet
status
## Show only errored pools with more verbosity
zpool status ‐xv
zpool iostat ‐v 5 5
statistics
Note: use this command like you would iostat
zpool history ‐il
history
Note: once a pool has been removed the history is gone
## performing a dry run but don't actual perform the creation (notice the ‐n)
zpool create ‐n data01 c1t0d0s0
# you can persume that I created two files called /zfs1/disk01 and /zfs1/disk02 using mkfile
zpool create data01 /zfs1/disk01 /zfs1/disk02
## you can also create raid pools (raidz/raidz1 ‐ mirror, raidz2 ‐ single parity, raidz3 double partity)
zpool create data01 raidz2 c1t0d0 c1t1d0 c1t2d0 c1t3d0 c1t4d0
## stop a scrubbing in progress, check the scrub line using "zpool status data01" to see any errors
scrubbing
zpool scrub ‐s data01
Note; see top of table for more information about resilvering and scrubbing
http://www.datadisk.co.uk/html_docs/sun/sun_zfs_cs.htm 2/7
1/11/2017 Sun ZFS cheatsheet
zpool export data01
exporting ## you can list exported pools using the import command
zpool import
zfs list
## complex listing
zfs list ‐o name,mounted,sharenfs,mountpoint
Note: there are a number of attributes that you can use in a complex listing, so use the man page to see them all
Note: there are all the normal mount options that you can apply i.e ro/rw, setuid
http://www.datadisk.co.uk/html_docs/sun/sun_zfs_cs.htm 3/7
1/11/2017 Sun ZFS cheatsheet
share zfs set sharenfs=on data01
## specific hosts
zfs set sharenfs="[email protected]/24" data01/apache
## snapshotting is like taking a picture, delta changes are recorded to the snapshot when the original file system changes, to
## remove a dataset all previous snaphots have to be removed, you can also rename snapshots.
## You cannot destroy a snapshot if it has a clone
## creating a snapshot
zfs snapshot data01@10022010
snapshotting
## renaming a snapshot
zfs snapshot data01@10022010 data01@keep_this
## destroying a snapshot
zfs destroy data01@10022010
## by default you can only rollback to the lastest snapshot, to rollback to older one you must delete all newer snapshots
rollback
zfs rollback data01@10022010
## clones are writeable filesystems that was upgraded from a snapshot, a dependency will remain on the snapshot as long as the
## clone exists. A clone uses the data from the snapshot to exist. As you use the clone it uses space separate from the snapshot.
## clones cannot be created across zpools, you need to use send/receive see below topics
## cloning
cloning/promoting zfs clone data01@10022010 data03/clone
zfs clone ‐o mountpoint=/clone data01@10022010 data03/clone
## promoting a clone, this allows you to destroy the original file ssytem that the clone is attached to
zfs promote data03/clone
## You enable compression by seeting a feature, compressions are on, off, lzjb, gzip, gzip[1‐9] ans zle, not that it only start
## compression when you turn it on, other existing data will not be compressed
zfs set compression=lzjb data03/apache
Compression
## you can get the compression ratio
zfs get compressratio data03/apache
## you can save disk space using deduplication which can be on file, block or byte, for example using file each file is hashed with a
## cryptographic hashing algorithm such as SHA‐256, if a file matches then we just point to the existing file rather than storing a
## new file, this is ideal for small files but for large files a single character change would mean that all the data has to be copied
## block deduplication allows you to share all the same blocks in a file minus the blocks that are different, this allows to share the
## unique blocks on disk and the reference shared blocks in RAM, however it may need a lot of RAM to keep track of which blocks
## are shared and which are not., however this is the preferred option other than file or byte deduplication. Shared blocks are
## stored in what is called a "deduplication table", the more deduplicated blocks the larger the table, the table is read everytime
## to make a block change thus the table should be held in fast RAM, if you run out of RAM then the table will spillover onto disk.
## So how much RAM do you need, you can use the zdb command to check, take the "bp count", it takes about 320 bytes of ram
## for each deduplicate block in the pool, so in my case 288674 means I would need about 92MB, for example a 200GB would need
Deduplication ## about 670MB for the table, a good rule would be to allow 5GB of RAM for every 1TB of disk.
## to turn on deduplicate
zfs set dedup=on data01/text_files
## to see the histrogram of howm many blocks are referenced how many time
zdb ‐DD <pool>
http://www.datadisk.co.uk/html_docs/sun/sun_zfs_cs.htm 4/7
1/11/2017 Sun ZFS cheatsheet
## List all the properties
zfs get all data03/oracle
Note: the source column denotes if the value has been change from it default value, a dash in this column means it is a read‐only
value
Note: use the command "zfs get all <dataset> " to obtain list of current settings
## List all the datasets that are not at the current level
upgrade
zfs upgrade
## create mountpoints
mkdir /master
mkdir /slave
## create a snapshot and send it to the slave, you could use SSH or tape to transfer to another server (see below)
zfs snapshot master/data@1
zfs send master/data@1 | zfs receive slave/data
## set the slave to read‐only because you can cause data corruption, make sure if do this before accessing anything the
## slave/data directory
send/receive zfs set readonly=on slave/data
## create a second snapshot and send the differences, you may get an error message saying that the desination had been
## modified this is because you did not set the slave/data to ready only (see above)
zfs snapshot master/data@2
zfs send ‐i master/data@1 master/data@2 | zfs receive slave/data
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
## using SSH
zfs send master/data@1 | ssh backup_server zfs receive backups/data@1
http://www.datadisk.co.uk/html_docs/sun/sun_zfs_cs.htm 5/7
1/11/2017 Sun ZFS cheatsheet
## display the permissions set and any user permissions
zfs allow master
Note: there are many permissions that you can set so see the man page or just use the "zfs allow" command
## Not strickly a command but wanted to discuss here, you can apply a quota to a dataset, you can reduce this quota only if the
## quota has not already exceeded, if you exceed the quota you will get a error message, you also have reservations which will
## guarantee that a specified amount of disk space is available to the filesystem, both are applied to datasets and there
## descendants (snapshots, clones)
## Newer versions of Solaris allow you to set group and user quota's
## you can also use refquota and refreservation to manage the space without accounting for disk space consumed by descendants
## such as snapshots and clones. Generally you would set quota and reservation higher than refquota and refreservation
quota & reservation ‐ properties are used for managing disk space consumed by datasets and their descendants
refquota & refreservation ‐ properties are used for managing disk space consumed by datasets only
## set a quota
Quota/Reservation
zfs set quota=100M data01/apache
## get a quota
zfs get quota data01/apache
## List user quota (use groupspace for groups), you can alsolist users with quota's for exampe root user
zfs userspace data01/apache
zfs get userused@vallep data01/apache
ZFS tasks
# scrub the pool to check for anymore errors (this depends on the size of the zpool as it can take a long time to complete
zpool scrub data01
# you can now remove the failed disk in the normal way depending on your hardware
# you cannot remove a disk from a pool but you can replace it with a larger disk
Expand a pools capacity zpool replace data01 c1t0d0 c2t0d0
zpool set autoexpand=on data01
# the command depends if you are using a sparc or a x86 system
Install the boot block sparc ‐ installboot ‐F zfs /usr/platform/`uname ‐i`/lib/fs/zfs/bootblk /dev/rdsk/c0t1d0
x86 ‐ installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c0t1d0s0
## option one
ok> boot ‐F failsafe
whne requested follow the instructions to mount the rpool on /a
cd /a/etc
vi passwd|shadow
init 6
Lost root password
## Option two
ok boot cdrom|net ‐s (you can boot from the network or cdroml)
zpool import ‐R /a rpool
zfs mount rpool/ROOT/zfsBE
cd /a/etc
http://www.datadisk.co.uk/html_docs/sun/sun_zfs_cs.htm 6/7
1/11/2017 Sun ZFS cheatsheet
vi passwd|shadow
init 6
## offline and unconfigure failed disk, there may be different options on unconfiguring a disk depends on the hardware
zpool offline rpool c0t0d0s0
cfgadm ‐c unconfigure c1::dsk/c0t0d0
# Now you can physically replace the disk, reconfigure it and bring it online
cfgadm ‐c configure c1::dsk/c0t0d0
zpool online rpool c0t0d0
Primary mirror disk in
# Let the pool know you have replaced the disk
root is unavailable or fails
zpool replace rpool c0t0d0s0
# if the replace above fails the detach and reattach the primary mirror
zpool deatch rpool c0t0d0s0
zpool attach rpool c0t1d0s0 c0t0d0s0
# make checks
zpool status rpool
# You can resize the swap if it is not being used, first record the size and if it is being used
swap ‐l
Note: if you cannot delete the original swap area due to being too busy then simple add another swap area, the same procedure is
used for dump areas but using the "dumpadm" command
http://www.datadisk.co.uk/html_docs/sun/sun_zfs_cs.htm 7/7