ZFS (Zettabyte File System) is a feature-rich and advanced file system known for its robust data integrity, snapshot capabilities, efficient copy-on-write mechanism, and support for data compression. It's commonly used in various operating systems for reliable and scalable data storage and management.

In a ZFS mirror pool, data is stored redundantly by mirroring it across two or more drives or devices. The data is duplicated on every drive,  providing data redundancy and increased performance given that multiple drives can read data in parallel.

Over time, drives can degrade or fail as happened to me this week. In this post I show the steps I have taken to replace it

Before you proceed

  1. As sysadmin and security engineer I extend my life expectancy by always performing backups
  2. Obviously you need a new drive of the same size or larger

If there is room for the new one, I personally prefer to replace the damaged one at the ZFS level, keeping it physically connected and I will remove it later when the mirror is healthy


Identify the degraded drive

Run zpool status to check your zpool and identify the faulted drive

# zpool status
  pool: data
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a degraded state
action: Replace the faulted device, or use 'zpool clear' to mark the device repaired
  scan: resilvered 19.1 in 00:00:48 with 0 errors on Tue Sep 19 09:13:54 2023
config:

        NAME                                            STATE     READ WRITE CKSUM
        data                                            DEGRADED     0     0     0
          mirror-0                                      DEGRADED     0     0     0
            wwn-0x5000c500a8245fa6                      ONLINE       0     0     0
            wwn-0x5000c500a8220203                      DEGRADED    22     0     0 too many errors


Connect and verify the new drive


Connect the new drive physically to your system and check it has been recognized.

I added two new drives that dmesg showed in a few log lines

You can also check using lsblk command

And identify the disk

Run the replacement command


zpool replace <POOL> <DEGRADED_DISK> <NEW_DISK_COMPLETE_PATH>

# zpool replace data wwn-0x5000c500a8220203 /dev/disk/by-id/ata-WDC_WD20EFAX-68B2RN1_WD-WX32D53408US

# zpool status
  pool: data
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Sep 21 15:47:26 2023
        24.8G scanned at 4.96G/s, 372K issued at 74.4K/s, 667G total
        0B resilvered, 0.00% done, no estimated completion time
config:

        NAME                                            STATE     READ WRITE CKSUM
        data                                            ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            wwn-0x5000c500a8245fa6                      ONLINE       0     0     0
            replacing-1                                 ONLINE       0     0     0
              wwn-0x5000c500a8220203                    ONLINE       0     0     0
              ata-WDC_WD20EFAX-68B2RN1_WD-WX32D53408US  ONLINE       0     0     0

errors: No known data errors


This will take its time

  scan: resilver in progress since Thu Sep 21 15:47:26 2023
        194G scanned at 404M/s, 21.5G issued at 44.7M/s, 667G total
        21.5G resilvered, 3.22% done, 04:06:37 to go

Finished

# zpool status
  pool: data
 state: ONLINE
  scan: resilvered 668G in 04:47:28 with 0 errors on Thu Sep 21 20:34:54 2023
config:

        NAME                                          STATE     READ WRITE CKSUM
        data                                          ONLINE       0     0     0
          mirror-0                                    ONLINE       0     0     0
            wwn-0x5000c500a8245fa6                    ONLINE       0     0     0
            ata-WDC_WD20EFAX-68B2RN1_WD-WX32D53408US  ONLINE       0     0     0

errors: No known data errors


Deattach physically the degraded drive


To ensure you identify the correct drive, get all the info you can of the degraded drive and the serial number will be on that data

Stop your system, check the drives for the serial numbers and remove the correct one. That's all!