I had to migrate one storage server from FreeBSD (NAS4Free to be exact) to Centos Linux 7. Sadly recent NAS4FREE was just too unstable on this particular hardware, f.e. any attempt to change configuration using web interface was causing a reboot with no meaningful message in logs – unacceptable as I rely on it in few of my projects, for example my diskless boot of HP Blades in my Openstack deployment. Shame cause I liked the idea behind it.
Anyway, because I consider now ZFS on Linux production ready I decided to move to Centos 7 – I like Centos more and more, and with version 7 being supported until 2024 I’m getting 8 more years of trouble free ride.
Before deploying new OS I removed log and cache devices from my ZFS pool. What I didn’t do was removing a spare and that bitten me in the, oh you know probably where. When I imported my pool under Centos, spare disk was in status “UNAVAIL”.
# zpool status -v pool: tank state: ONLINE scan: scrub repaired 0 in 2h19m with 0 errors on Tue Nov 1 03:19:26 2016 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz3-0 ONLINE 0 0 0 ata-WDC_WD4000FYYZ-01UL1B3_WD-xxxx ONLINE 0 0 0 ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx ONLINE 0 0 0 ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx ONLINE 0 0 0 ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx ONLINE 0 0 0 ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx ONLINE 0 0 0 ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx ONLINE 0 0 0 ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx ONLINE 0 0 0 ata-WDC_WD4000FYYZ-01UL1B2_WD-xxxx ONLINE 0 0 0 ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx ONLINE 0 0 0 ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx ONLINE 0 0 0 ata-WDC_WD4000FYYZ-01UL1B2_WD-xxxx ONLINE 0 0 0 spares mfisyspd10 UNAVAIL errors: No known data errors
Attempt to “zpool remove tank mfisyspd10” was unsuccessful, as zpool was claiming it cannot see this device. D’oh.
Fortunately ZFS comes with zdb, low level utility that can display lots of interesting stuff, if you are into this kind of thing. Most importantly, it can help us to determine numerical ID of the device, ID that can be used to operate on this disk.
By examining content of /dev/disk/by-id/ based on serial numbers I realised that “missing” mfisyspd10 is now called “sdk” under Linux.
zdb -l /dev/sdk # this came back with long numerical ID zpool remove tank 12658963864105390900 # now phantom should be gone, as confirmed with zpool status -v # we can re-add it using Linux mechanism zpool add tank spare -f /dev/disk/by-id/ata-WDC_WD4000FYYZ-01UL1B2_WD-xxxxxxxxx
Done. Now I can re-add cache and log devices, using partitions from my internal SSD drives and start feeding ZFS pool cache/log data into Check_MK using this script