ZFS (Zettabyte File System) is a feature-rich and advanced file system known for its robust data integrity, snapshot capabilities, efficient copy-on-write mechanism, and support for data compression. It's commonly used in various operating systems for reliable and scalable data storage and management.
In a ZFS mirror pool, data is stored redundantly by mirroring it across two or more drives or devices. The data is duplicated on every drive, providing data redundancy and increased performance given that multiple drives can read data in parallel.
Over time, drives can degrade or fail as happened to me this week. In this post I show the steps I have taken to replace it
Before you proceed
- As sysadmin and security engineer I extend my life expectancy by always performing backups
- Obviously you need a new drive of the same size or larger
If there is room for the new one, I personally prefer to replace the damaged one at the ZFS level, keeping it physically connected and I will remove it later when the mirror is healthy
Identify the degraded drive
Run zpool status to check your zpool and identify the faulted drive
# zpool status pool: data state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state action: Replace the faulted device, or use 'zpool clear' to mark the device repaired scan: resilvered 19.1 in 00:00:48 with 0 errors on Tue Sep 19 09:13:54 2023 config: NAME STATE READ WRITE CKSUM data DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 wwn-0x5000c500a8245fa6 ONLINE 0 0 0 wwn-0x5000c500a8220203 DEGRADED 22 0 0 too many errors
Connect and verify the new drive
Connect the new drive physically to your system and check it has been recognized.
I added two new drives that dmesg showed in a few log lines
You can also check using lsblk command
And identify the disk
Run the replacement command
zpool replace <POOL> <DEGRADED_DISK> <NEW_DISK_COMPLETE_PATH>
# zpool replace data wwn-0x5000c500a8220203 /dev/disk/by-id/ata-WDC_WD20EFAX-68B2RN1_WD-WX32D53408US # zpool status pool: data state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Thu Sep 21 15:47:26 2023 24.8G scanned at 4.96G/s, 372K issued at 74.4K/s, 667G total 0B resilvered, 0.00% done, no estimated completion time config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 wwn-0x5000c500a8245fa6 ONLINE 0 0 0 replacing-1 ONLINE 0 0 0 wwn-0x5000c500a8220203 ONLINE 0 0 0 ata-WDC_WD20EFAX-68B2RN1_WD-WX32D53408US ONLINE 0 0 0 errors: No known data errors
This will take its time
scan: resilver in progress since Thu Sep 21 15:47:26 2023 194G scanned at 404M/s, 21.5G issued at 44.7M/s, 667G total 21.5G resilvered, 3.22% done, 04:06:37 to go
# zpool status pool: data state: ONLINE scan: resilvered 668G in 04:47:28 with 0 errors on Thu Sep 21 20:34:54 2023 config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 wwn-0x5000c500a8245fa6 ONLINE 0 0 0 ata-WDC_WD20EFAX-68B2RN1_WD-WX32D53408US ONLINE 0 0 0 errors: No known data errors
Deattach physically the degraded drive
To ensure you identify the correct drive, get all the info you can of the degraded drive and the serial number will be on that data
Stop your system, check the drives for the serial numbers and remove the correct one. That's all!