Nov 152016
 

I had to migrate one storage server from FreeBSD (NAS4Free to be exact) to Centos Linux 7. Sadly recent NAS4FREE was just too unstable on this particular hardware, f.e. any attempt to change configuration using web interface was causing a reboot with no meaningful message in logs – unacceptable as I rely on it in few of my projects, for example my diskless boot of HP Blades in my Openstack deployment. Shame cause I liked the idea behind it.

Anyway, because I consider now ZFS on Linux production ready I decided to move to Centos 7 – I like Centos more and more, and with version 7 being supported until 2024 I’m getting 8 more years of trouble free ride.

Before deploying new OS I removed log and cache devices from my ZFS pool. What I didn’t do was removing a spare and that bitten me in the, oh you know probably where. When I imported my pool under Centos, spare disk was in status “UNAVAIL”.

# zpool status -v
  pool: tank
 state: ONLINE
  scan: scrub repaired 0 in 2h19m with 0 errors on Tue Nov  1 03:19:26 2016
config:

	NAME                                            STATE     READ WRITE CKSUM
	tank                                            ONLINE       0     0     0
	  raidz3-0                                      ONLINE       0     0     0
	    ata-WDC_WD4000FYYZ-01UL1B3_WD-xxxx  ONLINE       0     0     0
	    ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx  ONLINE       0     0     0
	    ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx  ONLINE       0     0     0
	    ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx  ONLINE       0     0     0
	    ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx  ONLINE       0     0     0
	    ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx  ONLINE       0     0     0
	    ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx  ONLINE       0     0     0
	    ata-WDC_WD4000FYYZ-01UL1B2_WD-xxxx  ONLINE       0     0     0
	    ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx  ONLINE       0     0     0
	    ata-WDC_WD4000FYYZ-01UL1B0_WD-xxxx  ONLINE       0     0     0
	    ata-WDC_WD4000FYYZ-01UL1B2_WD-xxxx  ONLINE       0     0     0
	spares
	  mfisyspd10                                    UNAVAIL 

errors: No known data errors

Attempt to “zpool remove tank mfisyspd10” was unsuccessful, as zpool was claiming it cannot see this device. D’oh.

Fortunately ZFS comes with zdb, low level utility that can display lots of interesting stuff, if you are into this kind of thing. Most importantly, it can help us to determine numerical ID of the device, ID that can be used to operate on this disk.
By examining content of /dev/disk/by-id/ based on serial numbers I realised that “missing” mfisyspd10 is now called “sdk” under Linux.

zdb -l /dev/sdk # this came back with long numerical ID

zpool remove tank 12658963864105390900 # now phantom should be gone, as confirmed with zpool status -v

# we can re-add it using Linux mechanism

zpool add tank spare -f /dev/disk/by-id/ata-WDC_WD4000FYYZ-01UL1B2_WD-xxxxxxxxx

Done. Now I can re-add cache and log devices, using partitions from my internal SSD drives and start feeding  ZFS pool cache/log data into Check_MK using this script

Sep 162016
 

My notes for installing Son of Grid Engine (SGE) on commodity cluster.

golden_h

Intro

Grab from here  the following RPM packages:

gridengine-8.1.9-1.el6.x86_64.rpm
gridengine-debuginfo-8.1.9-1.el6.x86_64.rpm
gridengine-devel-8.1.9-1.el6.noarch.rpm
gridengine-drmaa4ruby-8.1.9-1.el6.noarch.rpm
gridengine-execd-8.1.9-1.el6.x86_64.rpm
gridengine-guiinst-8.1.9-1.el6.noarch.rpm
gridengine-qmaster-8.1.9-1.el6.x86_64.rpm
gridengine-qmon-8.1.9-1.el6.x86_64.rpm

(at the time of writing version 8.1.9).

For your convenience, the following one liner should fetch these for you 🙂

cd /tmp; for i in gridengine-8.1.9-1.el6.x86_64.rpm gridengine-debuginfo-8.1.9-1.el6.x86_64.rpm gridengine-devel-8.1.9-1.el6.noarch.rpm gridengine-drmaa4ruby-8.1.9-1.el6.noarch.rpm gridengine-execd-8.1.9-1.el6.x86_64.rpm gridengine-guiinst-8.1.9-1.el6.noarch.rpm gridengine-qmaster-8.1.9-1.el6.x86_64.rpm gridengine-qmon-8.1.9-1.el6.x86_64.rpm; do wget https://arc.liv.ac.uk/downloads/SGE/releases/8.1.9/$i;done

Pick one server that will be serving as a master node in your cluster, referred later as qmaster.
For smaller clusters it can happily run on small VM (say 2x vCPU, 2GB RAM) maximising your resource usage.

Install EPEL on all nodes

rpm -Uvh http://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm

Install prerequisits on all nodes

yum install -y perl-Env.noarch perl-Exporter.noarch perl-File-BaseDir.noarch perl-Getopt-Long.noarch perl-libs perl-POSIX-strptime.x86_64 perl-XML-Simple.noarch jemalloc munge-libs hwloc lesstif csh ruby xorg-x11-fonts xterm java xorg-x11-fonts-ISO8859-1-100dpi xorg-x11-fonts-ISO8859-1-75dpi mailx

Install GridEngine packages on all nodes

cd /tmp/
yum localinstall gridengine-*

Install Qmaster

cd /opt/sge
./install_qmaster

Accepting defaults should just work, well you might want to run it under different user than r00t so:

"Please enter a valid user name >> sgeadmin"

Make sure to add GridEngine to global environment:

cp /opt/sge/default/common/settings.sh /etc/profile.d/sge.sh

NFS export SGE root to nodes in your cluster

vim /etc/exports

/opt/sge 10.10.80.0/255.255.255.0(rw,no_root_squash,sync,no_subtree_check,nohide)

and mount share on exec nodes

vim /etc/fstab

qmaster:/opt/sge 	/opt/sge nfs	tcp,intr,noatime	0	0

 

Installing exec nodes

cd /opt/sge
./install_execd

Just go with the flow here. Once done you should be able to see your exec nodes:

# qhost 
HOSTNAME                ARCH         NCPU NSOC NCOR NTHR  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
----------------------------------------------------------------------------------------------
global                  -               -    -    -    -     -       -       -       -       -
execnode01              lx-amd64        8    2    8    8  0.12   15.6G    5.2G   20.0G  104.9M
execnode02        	lx-amd64        8    2    8    8  0.00   15.7G    1.3G   21.1G     0.0
execnode03              lx-amd64        8    2    8    8  0.00   15.7G    1.4G   21.1G   18.6M

That means you can start submitting jobs to your cluster, either interactive with qlogin or qrsh or batch jobs with qsub.

Adding queues (for FSL)

In most cases it’s enough to have a default queue called all.q

This example will define new queues with different priorities (nice levels):

# change defaults for all.q
qconf -sq all.q |\
    sed -e 's/bin\/csh/bin\/sh/' |\
    sed -e 's/posix_compliant/unix_behavior/' |\
    sed -e 's/priority              0/priority 20/' >\
    /tmp/q.tmp
qconf -Mq /tmp/q.tmp

# add other queues
sed -e 's/all.q/verylong.q/' /tmp/q.tmp >\
   /tmp/verylong.q
qconf -Aq /tmp/verylong.q

sed -e 's/all.q/long.q/' /tmp/q.tmp |\
   sed -e 's/priority *20/priority 15/' >\
   /tmp/long.q
qconf -Aq /tmp/long.q

sed -e 's/all.q/short.q/' /tmp/q.tmp |\
   sed -e 's/priority *20/priority 10/' >\
   /tmp/short.q
qconf -Aq /tmp/short.q

sed -e 's/all.q/veryshort.q/' /tmp/q.tmp |\
   sed -e 's/priority *20/priority 5/' >\
   /tmp/veryshort.q
qconf -Aq /tmp/veryshort.q

Monitoring your cluster

 

Use qmon GUI or the following commands:

# qstat -f

queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@execnode01 BIP   0/0/8          0.12     lx-amd64      
---------------------------------------------------------------------------------
all.q@execnode02 BIP   0/0/8          0.00     lx-amd64      
---------------------------------------------------------------------------------
all.q@execnode03 BIP   0/0/8          0.00     lx-amd64      
---------------------------------------------------------------------------------
long.q@execnode01 BIP   0/0/8          0.12     lx-amd64      
---------------------------------------------------------------------------------
long.q@execnode02 BIP   0/0/8          0.00     lx-amd64      
---------------------------------------------------------------------------------
long.q@execnode03 BIP   0/0/8          0.00     lx-amd64      
---------------------------------------------------------------------------------
short.q@execnode01 BIP   0/0/8          0.12     lx-amd64      
---------------------------------------------------------------------------------
short.q@execnode02 BIP   0/0/8          0.00     lx-amd64      
---------------------------------------------------------------------------------
short.q@execnode03 BIP   0/0/8          0.00     lx-amd64      
---------------------------------------------------------------------------------
verylong.q@execnode01 BIP   0/0/8          0.12     lx-amd64      
---------------------------------------------------------------------------------
verylong.q@execnode02 BIP   0/0/8          0.00     lx-amd64      
---------------------------------------------------------------------------------
verylong.q@execnode03 BIP   0/0/8          0.00     lx-amd64      
---------------------------------------------------------------------------------
veryshort.q@execnode01 BIP   0/0/8          0.12     lx-amd64      
---------------------------------------------------------------------------------
veryshort.q@execnode02 BIP   0/0/8          0.00     lx-amd64      
---------------------------------------------------------------------------------
veryshort.q@execnode03 BIP   0/0/8          0.00     lx-amd64   

# qhost -q

HOSTNAME                ARCH         NCPU NSOC NCOR NTHR  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
----------------------------------------------------------------------------------------------
global                  -               -    -    -    -     -       -       -       -       -
execnode01            lx-amd64        8    2    8    8  0.12   15.6G    5.2G   20.0G  104.9M
   all.q                BIP   0/0/8         
   long.q               BIP   0/0/8         
   short.q              BIP   0/0/8         
   veryshort.q          BIP   0/0/8         
   verylong.q           BIP   0/0/8         
execnode02        lx-amd64        8   2    8    8  0.00   15.7G    1.3G   21.1G     0.0
   all.q                BIP   0/0/8         
   long.q               BIP   0/0/8         
   short.q              BIP   0/0/8         
   veryshort.q          BIP   0/0/8         
   verylong.q           BIP   0/0/8         
execnode03                lx-amd64    8    2    8    8  0.00   15.7G    1.4G   21.1G   18.6M
   all.q                BIP   0/0/8         
   long.q               BIP   0/0/8         
   short.q              BIP   0/0/8         
   veryshort.q          BIP   0/0/8         
   verylong.q           BIP   0/0/8