Migrate Linux (home)server to ZFS

Recently I moved my home server to a (refurbished) Dell Optiplex 7050. The previous PC was on ext4, and I decided to try ZFS without reinstalling it.

Why ZFS?

ZFS is an enterprise-grade file system. It supports a very long list of features; most target big data storage. However, some of these are a “nice-to-have” in a home server. In my case:

The ability to transfer ZFS snapshots is fantastic, although I don’t think I’ll use too much at home. I’ll keep doing my backups with restic: leveraging on a single tool (like ZFS snap/send/recv for backups) might result in some weird bug, like the one that was present in some version of ZFS in Ubuntu (note that even ZFS snapshots were corrupted in that case).

What about btrfs? Well, if you take a look at their “Status” page for btrfs features you may discover that replacing a disk might result in some issues on I/O errors or that their RAID support is “mostly OK”, which is not very encouraging.

Prepare the new machine

The new PC is equipped with a single NVMe disk (refurbished), so no RAID (on the previous PC there was a RAID1) and non-ECC RAM (like before). Currently, I have no budget for a 2nd disk or ECC RAM, so I’ll live with it :-(

For the ZFS preparation, I adapter the good guide that you can find on the OpenZFS website. I used GRML, which is a Debian (bullseye at the time of writing) live distribution with some tools. GRML comes with zfs-dkms installed: I removed it and installed the zfs-modules package (ZFS kernel module) for the running kernel (I’m pre-building the ZFS package using dkms mkbmdeb - alternatively you’ll need zfs-dkms and kernel headers installed).

Old storage layout

NAME            MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda               8:0    0 465,8G  0 disk
├─sda1            8:1    0   953M  0 part
│ └─md0           9:0    0   952M  0 raid1 /boot
└─sda2            8:2    0 464,8G  0 part
  └─md1           9:1    0 464,7G  0 raid1
    └─md1_crypt 253:0    0 464,7G  0 crypt /
sdb               8:16   0 465,8G  0 disk
├─sdb1            8:17   0   953M  0 part
│ └─md0           9:0    0   952M  0 raid1 /boot
└─sdb2            8:18   0 464,8G  0 part
  └─md1           9:1    0 464,7G  0 raid1
    └─md1_crypt 253:0    0 464,7G  0 crypt /

New storage layout

NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
nvme0n1     259:0    0   477G  0 disk
├─nvme0n1p2 259:1    0   512M  0 part
├─nvme0n1p3 259:2    0     2G  0 part
└─nvme0n1p4 259:3    0 474,4G  0 part

We’ll create a new GPT disk with the EFI partition (nvme0n1p2), a boot pool named bpool (nvme0n1p3) and a root pool named rpool (nvme0n1p4). The bpool is needed because GRUB cannot boot from a pool that has newer features, such as encryption. I’ll use native encryption in some datasets (no more LUKS, sorry). Quotas will limit datasets to avoid that one dataset can stall the system. Everything will be prepared inside /target while on the live disk.

Here a summary of commands that I used (most of them comes from the OpenZFS guide that I mentioned before):

# Install requirements
# See the OpenZFS guide to install zfs-dkms and linux headers
apt install --yes debootstrap gdisk zfs-modules-$(uname -r) zfsutils-linux

# Set useful variable `DISK` with the path for the *new* physical disk (in my case, the NVMe)
# The path depends on your system
DISK=/dev/disk/by-id/nvme-eui.0000000000000000

# ZAP every partition on the new disk
sgdisk --zap-all $DISK

# UEFI partition
sgdisk     -n2:1M:+512M   -t2:EF00 $DISK

# Create boot pool partition
sgdisk     -n3:0:+1G      -t3:BF01 $DISK

# Create root pool partition
sgdisk     -n4:0:0        -t4:BF00 $DISK

# Create the boot pool bpool
zpool create \
    -o cachefile=/etc/zfs/zpool.cache \
    -o ashift=12 -d \
    -o feature@async_destroy=enabled \
    -o feature@bookmarks=enabled \
    -o feature@embedded_data=enabled \
    -o feature@empty_bpobj=enabled \
    -o feature@enabled_txg=enabled \
    -o feature@extensible_dataset=enabled \
    -o feature@filesystem_limits=enabled \
    -o feature@hole_birth=enabled \
    -o feature@large_blocks=enabled \
    -o feature@lz4_compress=enabled \
    -o feature@spacemap_histogram=enabled \
    -o feature@zpool_checkpoint=enabled \
    -O acltype=posixacl -O canmount=off -O compression=lz4 \
    -O devices=off -O normalization=formD -O relatime=on -O xattr=sa \
    -O mountpoint=/boot -R /target \
    bpool ${DISK}-part3

# Create the encrypted root pool rpool
zpool create \
    -o ashift=12 \
    -O encryption=aes-256-gcm \
    -O keylocation=prompt -O keyformat=passphrase \
    -O acltype=posixacl -O canmount=off -O compression=lz4 \
    -O dnodesize=auto -O normalization=formD -O relatime=on \
    -O xattr=sa -O mountpoint=/ -R /target \
    rpool ${DISK}-part4

# Create the ZFS dataset for /boot
zfs create -o mountpoint=/boot bpool/boot
# Set safe quota (plenty of space for kernels and initramfs)
zfs set quota=1200M bpool/boot

# Create the ZFS dataset for /
zfs create -o mountpoint=/ rpool/ROOT
zfs mount rpool/ROOT

# Create datasets for home dirs
zfs create rpool/home
zfs create -o mountpoint=/root rpool/home/root
# Set safe quota (usually there is nothing there)
zfs set quota=50G rpool/home

# Create datasets for /var
zfs create -o canmount=off rpool/var
zfs create rpool/var/log
zfs create rpool/var/spool
zfs create -o com.sun:auto-snapshot=false  rpool/var/cache
zfs create -o com.sun:auto-snapshot=false  rpool/var/tmp
chmod 1777 /target/var/tmp
zfs create -o com.sun:auto-snapshot=false  rpool/tmp
chmod 1777 /target/tmp
# Set safe quota values
zfs set quota=50G rpool/var
zfs set quota=50G rpool/tmp

# Create a separate pool for rancher k3s
zfs create -o com.sun:auto-snapshot=false -o mountpoint=/var/lib/rancher/k3s  rpool/k3s
zfs create rpool/k3s/agent
zfs create rpool/k3s/data
zfs create rpool/k3s/server
zfs create rpool/k3s/storage
# Set safe quota values (there we have some data, and container images)
zfs set quota=50G rpool/k3s/agent
zfs set quota=50G rpool/k3s/data
zfs set quota=50G rpool/k3s/server
zfs set quota=300G rpool/k3s/storage

If everything is correct, zfs list should be something like:

NAME                USED  AVAIL     REFER  MOUNTPOINT
bpool               112M  1.64G       96K  /target/boot
bpool/boot          112M  1.06G      112M  /target/boot
rpool               149G   309G      192K  /target
rpool/ROOT         12.7G   309G     12.7G  /target
rpool/home         1.79G  48.2G      264K  /target/home
rpool/home/root    1.79G  48.2G     1.79G  /target/root
rpool/k3s           134G   309G      296K  /target/var/lib/rancher/k3s
rpool/k3s/agent    18.0G  32.0G     18.0G  /target/var/lib/rancher/k3s/agent
rpool/k3s/data      147M  49.9G      147M  /target/var/lib/rancher/k3s/data
rpool/k3s/server   7.16M  50.0G     7.16M  /target/var/lib/rancher/k3s/server
rpool/k3s/storage   116G   184G      116G  /target/var/lib/rancher/k3s/storage
rpool/tmp          4.91M  50.0G     4.91M  /target/tmp
rpool/var           112M  49.9G      192K  /target/var
rpool/var/cache    48.3M  49.9G     48.3M  /target/var/cache
rpool/var/log      61.4M  49.9G     61.4M  /target/var/log
rpool/var/spool    1.50M  49.9G     1.50M  /target/var/spool
rpool/var/tmp       224K  49.9G      224K  /target/var/tmp

As you can see, some datasets have smaller AVAIL space thanks to the quota.

Clone data and system

Now that we have a skeleton for the new system, we can proceed to copy everything.

Will the new system accept ZFS? What kind of issues I’ll face? I don’t know yet, so I did some test runs using backup copies before powering off the old system and copying everything. Thanks to restic mount, I mounted the latest backup for my home server in /mnt/backup and I rsync’ed everything:

rsync -ahPHAXx --info=progress2 -e ssh /mnt/backup/hosts/proxima/latest/ /target

I already have the ZFS module installed because I’m using it on an external drive. However, we’ll need zfs-initramfs to load ZFS on boot. Also, we need to install grub-efi and switch to EFI boot (as the previous system was on CSM/BIOS).

# Save old ZFS cache and replace with the temporary one
mv /target/etc/zfs/zpool.cache /target/etc/zfs/zpool.cache.old
zpool set cachefile=/etc/zfs/zpool.cache rpool
zpool set cachefile=/etc/zfs/zpool.cache bpool
cp /etc/zfs/zpool.cache /target/etc/zfs/zpool.cache

# Prepare chroot environment
mount -o bind /dev/ /target/dev
mount -t proc none /target/proc
mount -t sysfs none /target/sys

# Switch root
chroot /target /usr/bin/env DISK=$DISK bash --login
export PS1="(chroot) $PS1"
# I'm using LC_ALL="it_IT.UTF-8"
export LC_ALL="it_IT.UTF-8"

apt update

# Remove old packages
apt remove cryptsetup-initramfs

# If you don't have ZFS already installed, you should do it now. See the OpenZFS guide

# Install ZFS initramfs tools
apt install -t buster-backports zfs-initramfs

# Prepare EFI (see OpenZFS page)
apt install dosfstools
mkdosfs -F 32 -s 1 -n EFI ${DISK}-part2
mkdir /boot/efi
echo /dev/disk/by-uuid/$(blkid -s UUID -o value ${DISK}-part2) \
   /boot/efi vfat defaults 0 0 >> /etc/fstab
mount /boot/efi
apt install --yes grub-efi-amd64 shim-signed

# I don't have any other OS, so we can safely remove os-prober
apt purge --yes os-prober

Now we need to fix /etc/fstab and /etc/crypttab: there is no need for mounts there (ZFS will handle its mount points), so I deleted entries for /boot and / in fstab, and md1_crypt in crypttab.

After that, we can configure the auto-import of bpool (this snippet comes from OpenZFS manual) by creating a new file in /etc/systemd/system/zfs-import-bpool.service with this content:

[Unit]
DefaultDependencies=no
Before=zfs-import-scan.service
Before=zfs-import-cache.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zpool import -N -o cachefile=none bpool
# Work-around to preserve zpool cache:
ExecStartPre=-/bin/mv /etc/zfs/zpool.cache /etc/zfs/preboot_zpool.cache
ExecStartPost=-/bin/mv /etc/zfs/preboot_zpool.cache /etc/zfs/zpool.cache

[Install]
WantedBy=zfs-import.target

Enable the service by issuing systemctl enable zfs-import-bpool.service.

I had an issue at this point: grub-probe /boot was not recognizing zfs. I don’t know what happened, but I had to reboot grml and launch chroot again, and the problem disappeared…

Now we can configure grub by editing /etc/default/grub and adding root=ZFS=rpool/ROOT to the GRUB_CMDLINE_LINUX variable. Update grub configuration and install grub:

update-grub

grub-install --target=x86_64-efi --efi-directory=/boot/efi \
    --bootloader-id=debian --recheck --no-floppy

Finally, we need to fix the ethernet name: I had eno1 in the previous system, now I have enp0s31f6.

Boot

At this point, everything should be OK. If Debian can’t boot and drops you in a shell, verify that you installed the correct version of zfs-initramfs (at least 2.0.3), as older versions have issues with encryption in rpool.

Enjoy your ZFS-on-root server :-)

Acknowledgements

Thanks @lerrigatto for advices on some ZFS features :-)

Addendum: remotely unlock encrypted ZFS root pool/dataset

To unlock remotely ZFS encrypted datasets or dmcrypt partitions, we need that:

  1. the kernel has an IP address (optionally, a gateway)
  2. an SSH server inside the initramfs

So, let’s install dropbear. Debian provides a convenient pre-configured package:

apt install dropbear-initramfs

Put your SSH public key inside /etc/dropbear-initramfs/authorized_keys. Note that older versions of dropbear (like the one in Debian Buster) don’t support ed25519 keys. I have a specific RSA SSH key for that occasion. After configuring dropbear, update all initramdisks using update-initramfs -k all -u.

Now we need to add the IP configuration into the GRUB_CMDLINE_LINUX_DEFAULT variable (/etc/default/grub). The syntax is well explained in the official Linux Kernel documentation:

ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>:<dns0-ip>:<dns1-ip>:<ntp0-ip>

Unused fields can be left empty. Optionally, only one of the valid autoconf value can be specified (e.g., ip=dhcp) instead of empty fields.

Static assignment example:

ip=192.0.2.10::192.0.2.1:255.255.255.0

Update the configuration using update-grub. You can SSH into the server initramfs at boot and issue the zfsunlock command.