devnull

I've been a Linux Mercenary for quite a while now, often using information posted by kind strangers on the Internet to solve problems during this journey. The real strength of Linux lies within its community and this blog is a humble attempt to give something back to these wonderful people. If by any chance I helped with sorting out your problem please consider funding a cup of espresso to get me going with BTC 1DXKjB9isvdLwRy6ABsRSZvVGxzqCvbrXy Thank you.

Nov 152016
 

I had to migrate one storage server from FreeBSD (NAS4Free to be exact) to Centos Linux 7. Sadly recent NAS4FREE was just too unstable on this particular hardware, f.e. any attempt to change configuration using web interface was causing a reboot with no meaningful message in logs – unacceptable as I rely on it in few of my projects, for example my diskless boot of HP Blades in my Openstack deployment. Shame cause I liked the idea behind it.

Anyway, because I consider now ZFS on Linux production ready I decided to move to Centos 7 – I like Centos more and more, and with version 7 being supported until 2024 I’m getting 8 more years of trouble free ride.

Before deploying new OS I removed log and cache devices from my ZFS pool. What I didn’t do was removing a spare and that bitten me in the, oh you know probably where. When I imported my pool under Centos, spare disk was in status “UNAVAIL”.

# zpool status -v
  pool: tank
 state: ONLINE
  scan: scrub repaired 0 in 2h19m with 0 errors on Tue Nov  1 03:19:26 2016
config:

	NAME                                            STATE     READ WRITE CKSUM
	tank                                            ONLINE       0     0     0
	  raidz3-0                                      ONLINE       0     0     0
	    ata-WDC_WD4000FYYZ-01UL1B3_WD-xxxx  ONLINE       0     0     0
	    ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx  ONLINE       0     0     0
	    ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx  ONLINE       0     0     0
	    ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx  ONLINE       0     0     0
	    ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx  ONLINE       0     0     0
	    ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx  ONLINE       0     0     0
	    ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx  ONLINE       0     0     0
	    ata-WDC_WD4000FYYZ-01UL1B2_WD-xxxx  ONLINE       0     0     0
	    ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx  ONLINE       0     0     0
	    ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx  ONLINE       0     0     0
	    ata-WDC_WD4000FYYZ-01UL1B2_WD-xxxx  ONLINE       0     0     0
	spares
	  mfisyspd10                                    UNAVAIL 

errors: No known data errors

Attempt to “zpool remove tank mfisyspd10” was unsuccessful, as zpool was claiming it cannot see this device. D’oh.

Fortunately ZFS comes with zdb, low level utility that can display lots of interesting stuff, if you are into this kind of thing. Most importantly, it can help us to determine numerical ID of the device, ID that can be used to operate on this disk.
By examining content of /dev/disk/by-id/ based on serial numbers I realised that “missing” mfisyspd10 is now called “sdk” under Linux.

zdb -l /dev/sdk # this came back with long numerical ID

zpool remove tank 12658963864105390900 # now phantom should be gone, as confirmed with zpool status -v

# we can re-add it using Linux mechanism

zpool add tank spare -f /dev/disk/by-id/ata-WDC_WD4000FYYZ-01UL1B2_WD-xxxxxxxxx

Done. Now I can re-add cache and log devices, using partitions from my internal SSD drives and start feeding  ZFS pool cache/log data into Check_MK using this script

Oct 262016
 

Today I was upgrading Dell PERC 6/i Integrated controller firmware on, rather old to put it mildly, PowerEdge 2950 server running Centos 7. Sadly update was failing with following message when I tried firing dup (dell update package for Red Hat Linux SAS-RAID_Firmware_3P52K_LN_6.3.3-0002_X00.BIN):

Oct 26 13:51:15 mielnet-web-dev03 kernel: sasdupie[11976]: segfault at 20 ip 00007fe7b2f3000d sp 00007ffe7d58bb00 error 4 in sasdupie[7fe7b2f0b000+110000]

What is going on? What’s causing segfault? After fiddling and googling I end up doing this:

chmod +x /tmp/SAS-RAID_Firmware_3P52K_LN_6.3.3-0002_X00.BIN 
/tmp/SAS-RAID_Firmware_3P52K_LN_6.3.3-0002_X00.BIN --extract /tmp/dup_extract_dir
cd /tmp/dup_extract_dir
./sasdupie -i -o inv.xml -debug

and after examining /tmp/dup_extract_dir/debug.log it turned out that sasdupie is segfaulting when trying to use libstorelibir.so.5 – so I figured version on system might be just too new. Lets try using a bit older version of this static object located under /opt/dell/srvadmin/lib64/ – it’s part of srvadmin-storelib RPM package by the way, RPM that can be installed from Dell repo.

cd /opt/dell/srvadmin/lib64
rm libstorelibir.so.5
ln -s libstorelibir-3.so libstorelibir.so.5

Keeping my fingers crossed and touching wood I typed using my nose:

/tmp/SAS-RAID_Firmware_3P52K_LN_6.3.3-0002_X00.BIN

Running validation...

PERC 6/i Integrated Controller 0

The version of this Update Package is newer than the currently installed version.
Software application name: PERC 6/i Integrated Controller 0 Firmware
Package version: 6.3.3-0002
Installed version: 6.0.2-0002

................................................
Device: PERC 6/i Integrated Controller 0
  Application: PERC 6/i Integrated Controller 0 Firmware
  The operation was successful.


Nice. No segfaults. Shout-out to Luiz Angelo Daros de Luca.