ZFS

From PacBSD Wiki
Jump to: navigation, search

From Wikipedia:

ZFS is a combined file system and logical volume manager designed by Sun Microsystems. The features of ZFS include protection against data corruption, support for high storage capacities, efficient data compression, integration of the concepts of filesystem and volume management, snapshots and copy-on-write clones, continuous integrity checking and automatic repair, RAID-Z and native NFSv4 ACLs. The ZFS name is registered as a trademark of Oracle Corporation.

FreeBSD, and thus by extension PacBSD, had native ZFS support since FreeBSD 7.0.

Choosing between UFS and ZFS

ZFS is the world's most advanced file system and PacBSD has a built-in support. ZFS has several technological advantages over the more traditional UFS:

  • Built in support for multiple device-based storage layouts
    • Support for various RAID configurations
    • Support for mirrored disks
  • End-to-end checksums
  • True live filesystem integrity checks, metadata and data, versus fsck only checking metadata and requires the filesystem to not be mounted
  • Transparent data compression
  • ZFS is designed to be a high capacity filesystem
  • Native support for encryption (encryption happens after compression, and before checksumming and deduplication)
  • Various cache support and management to speed up read and write options
  • Copy-on-write transactional model
  • Snapshots and clones
  • Send and receive snapshots between multiple computers
  • Dynamic striping
  • Variable block sizes
  • Lightweight filesystem creation
  • Data deduplication

The trade off for some of these features is higher CPU and RAM usage, making it a less than ideal choice for older computers with slower CPUs or minimum amount of RAM. A good rule of thumb is to have 1GB of RAM plus an additional 1GB of RAM for each 1TB of storage space. If data deduplication is enabled then the requirement becomes 5GB of RAM for every 1TB of storage. It is also highly recommended to only use ZFS on 64 bit systems, while it may work on 32 bit systems there may be some stability issues.

Compression

ZFS has built in support for transparently compressing data. Not only does enabling this save space in the pool, but in some cases it drastically improves performance. This is because the time it takes to compress or decompress the data is quicker than the time it takes to read or write the uncompressed data to disk.

Supported compression options are:

LZ4 compression is the recommended compression algorithm as it offers the best compression and best performance of the three. LZJB makes for a good second choice as it provides a good trade-off between speed and space. Gzip is no longer recommended but is still supported, like other things that offer gzip support the compression rate is configurable between level 0-9 (Zero offering the least amount of compression and nine being the most) by default ZFS will use gzip compression level 6.

To see the which, if any, datasets use compression use:

# zfs get -r compression tank
NAME                                                                       PROPERTY     VALUE     SOURCE
tank                                                                       compression  off       default
tank/HOME                                                                  compression  lz4       local
tank/HOME/root                                                             compression  off       local
tank/PORTS                                                                 compression  lz4       local
tank/ROOT                                                                  compression  off       default
tank/ROOT/pacbsd-0                                                         compression  lz4       local

The -r flag tells zfs get will work recursively to return not only the data for the tank pool but all datasets under it.

To enable compression on a dataset use:

# zfs set compression=lz4 tank/HOME/root

To see the ratio of space saved with compression use:

# zfs get compressratio tank
NAME  PROPERTY       VALUE  SOURCE
tank  compressratio  2.62x  -

Data Integrity

All data and metadata written in ZFS is checksummed to ensure that the data has not become corrupted over time. These checksums are used to validate the integrity of the file by checking for things like data rot or early stage drive failure. When a block is accessed, regardless of whether it is data or metadata, its checksum is calculated and compared with the stored checksum value of what it should be. If the checksums match, the data is processed normally, if the checksums do not match then ZFS will try to repair it by fetching a copy from a mirrored disk with a valid checksum or recreate it via the RAID.

To see which checksum algorithm is in use run:

# zfs get checksum tank
NAME  PROPERTY  VALUE      SOURCE
tank  checksum  sha256     local

While the checksum algorithm can be changed after the fact, the already existing checksums will need to be manually regenerated by rewriting the file(s). The easiest way to do this is with the zfs send and zfs receive commands, the data can either be first sent to an intermediate machine also using ZFS or be immediately written back to itself cutting out the need for a second computer. If using a second computer that must have a ZFS pool big enough to hold all the data from the pool that will be rechecksummed, if not using a second computer the pool must be large enough to hold a second copy of all the data. Either way depending on the amount of data on the pool this process may take some time to complete.


Note: The zfs send and zfs receive commands are out of scope of this section and are explained elsewhere.

Setting the active checksum algorithm:

# zfs set checksum=<checksum> tank

Where <checksum> is one of fletcher2, fletcher4 or sha256.

While the checksums are automatically checked when accessing data, the system administrator can also manually trigger checking the checksums for the entire pool:

# zpool scrub tank

This starts the scan/scrub in the background and no information is presented to the user.

To see the status of the most recent scrub for a pool use zpool status:

# zpool status tank
  pool: tank
 state: ONLINE
  scan: scrub repaired 0 in 4h28m with 0 errors on Sun Mar 27 07:28:54 2016
config:

	NAME        STATE     READ WRITE CKSUM
	tank        ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     0
	    ada0p3  ONLINE       0     0     0
	    ada1    ONLINE       0     0     0
	    ada2    ONLINE       0     0     0

errors: No known data errors

This shows that across all drives there are no read/write errors and all checksums match.

Boot Environments

ZFS supports the ability to boot from multiple or different zpools, commonly referred to as boot environments. This allows for rolling back from various mistakes or problems, such as a failed system update, deleting an important file or directory, etc.

File:Tango-document-new.png This article is a stub.
Notes: Details on creating and using boot environments needs added. Not sure if in FreeBSD 10.x and up if sysutils/beadm is still needed. (Discuss)
File:Tango-document-new.png

Data Deduplication

On top of built in compression support, ZFS can save a lot of disk space by using data deduplication at the cost of higher RAM requirements. In short deduplication allows storing the same data multiple times, but only take up the space of a single copy. Depending on the system and the kind of data being written this can lead to pretty substantial differences in storage space used. ZFS is capable of deduplicating data on the file, block or byte level making it a very versatile in practice. An example of using data deduping would be storing multiple copies of virtual machine images where the data is fairly consistent between them.

# zfs create tank/VMs
# zfs set dedup=on tank/VMs

Optionally the deduping can be set to use extra verification in the checksums to help avoid potential hash collisions. The downside to this is it adds extra overhead to both the checksum and deduping process. If the pool is set to use SHA-256 as the checksum hashing algorithm then the chances of a hash collision is low enough that this probably isn't needed.

# zfs set dedup=verify tank/VMs

If the pool is set to use SHA-256 and collision verification is set on the dataset, it is possible to tell ZFS to use a faster but weaker checksum for only this dataset to lessen the performance hit caused by it.

# zfs set checksum=fletcher4,verify tank/VMs

File system creation

Note: This guide uses /dev/ada0 and /dev/adap2 as examples, make sure to use the correct drive and partition when creating your file system.

This guide is only an example of setting up a very basic ZFS pool, while this may be good enough for most users this is by no means a recommended setup.

Creating the pool

ZFS can be used either on a single disk or across multiple disks in either a mirror or RAID-Z setup. Mirroring two or more drives offers the greatest redundancy as everything written to one drive is duplicated to the others, while RAID setups allows for striping data across multiple drives, creating redundancy across the disks or both. With enough drives it is also possible to use both mirroring and RAID-Z

Single disk pool creation:

# zpool create tank /dev/ada0p2

This creates a pool on /dev/ada0p2 called tank

Multiple disk pool creation:

# zpool create media raidz1 /dev/ada1 /dev/ada2 /dev/ada3 /dev/ada4

This creates a pool that spans across four drives, with enough redundancy for one drive failure, called media. Valid options are raidz1 which allows for one drive failure, raidz2 which allows for two drive failures, and raidz3 which allows for three drive failures.

To create mirrored pools, where one drive is an exact copy of another drive use:

# zpool create media mirror /dev/ada1 /dev/ada2 mirror /dev/ada3 /dev/ada4

This makes the pool so that /dev/ada2 is a mirror of /dev/ada1 and /dev/ada4 is a mirror of /dev/ada3.

To create a striped pool, where there is no data redundancy omit mirror and raidzn from the zpool create command:

# zpool create media /dev/ada1 /dev/ada2 /dev/ada3 /dev/ada4

This creates one large pool that contains the cumulative free space from all the vdevs, minus spaces reserved for ZFS to store metadata.

Listing all ZFS pools

It is possible to list all known ZFS pools to the system, and git a brief overview of them, with zfs list

# zfs list
NAME                                                    USED  AVAIL  REFER  MOUNTPOINT
tank                                                   30.5G   419G  25.3K  /tank

Export and Importing the pool

If this pool is going to be used for installation then it will need to be remounted from / to /mnt. Before remounting (exporting and reimporting), create a small ramdisk for /boot/zfs:

File:Tango-emblem-important.png The factual accuracy of this article or section is disputed.
Reason: The ramdisk step may no longer be needed, creating it doesn't hurt anything if done. (Discuss)
File:Tango-emblem-important.png
# mdmfs -s 128m md /boot/zfs
File:Tango-emblem-important.png The factual accuracy of this article or section is disputed.
Reason: The cache file may no longer be needed. (Discuss)
File:Tango-emblem-important.png

Remount the pool so it is mounted to /mnt:

# zpool export tank
# zpool import -o altroot=/mnt -o cachefile=/boot/zfs/zpool.cache -f tank

Setting the checksum algorithm

All data and metadata written in ZFS is checksummed to ensure that the data has not become corrupted over time. These checksums are used to validate the integrity of the file by checking for things like data rot or early stage drive failure. When a block is accessed, regardless of whether it is data or metadata, its checksum is calculated and compared with the stored checksum value of what it should be. If the checksums match, the data is processed normally, if the checksums do not match then ZFS will try to repair it by fetching a copy from a mirrored disk with a valid checksum or recreate it via the RAID.

Currently there are two supported checksum algorithms in ZFS: fletcher2, fletcher4 checksum and sha256 hash. Fletcher4 is the default as SHA-256 is generally renowned to be more CPU intensive when calculating hashes, with a fairly recent machine that isn't constantly under heavy load choosing SHA-256 over Fletcher4 should be fine.

To change the checksum algorithm on the pool run:

# zfs set checksum=sha256 tank

Creating datasets

One of the advantages ZFS has to offer is support for multiple datasets (subvolumes or file systems within a file system). Datasets are created with zfs create and can be passed arguments similar to what one would pass to mount.

# zfs create -o canmount=off -o mountpoint=legacy tank/ROOT
# zfs create -o canmount=on -o compression=lz4 -o mountpoint=/ tank/ROOT/pacbsd
# zfs create -o compression=lz4 -o mountpoint=/home tank/HOME
# zfs create -o compression=off -o mountpoint=/root tank/HOME/root

This creates four different datasets, one that is a global dataset (tank/ROOT) that isn't directly mountable by the system (canmount=off) and is managed by the administrator (mountpoint=legacy). After that separate datasets for / and /home are created and all data written in either dataset will automatically be compressed using the LZ4 algorithm (compression=lz4). There is also a separate dataset created for /root which doesn't use compression, as using the root account is highly discouraged there little point setting compression on root's home directory.

Swap on ZFS

It is possible to create a dataset under ZFS to use as swap space. For directions on how to set this up see Swap#ZFS_Swap_Volume

See Also