Sep 122016
 

Intro

This one is interesting. I’ve got a few HP BL260 blade servers, out of warranty but packed with RAM and CPU cores. Wanted to use them as compute nodes in my OpenStack cloud but all (literally all! I mean every single one!) internal SFF SATA drives died within 6 years.

Instead of replacing I decided to get rid of internal hard drives altogether and use Centos ability to use remote storage device for root partition. In the similar manner as VmWare ESXi hosts booting from iSCSI SAN – so no spinning disks inside compute node, no heat or additional energy consumption.

These cheap Blades didn’t have fancy HBA that would be able to boot from iSCSI so I used PXE booting instead. Essentially:

  • we set blade to boot from NIC
  • blade gets IP address and PXE boot server information with DHCP packet
  • blade pulls kernel and initrd from PXE server
  • blade uses iSCSI target LUN as R/W root device

iSCSI targets

iSCSI targets (one per each blade) created first on my ZFS server (NAS4FREE) – added bonus is that we can zfs-snapshot each blade’s LUN before applying critical updates.

 

Extents (zvols):

Name Path
mielnet-compute016 /dev/zvol/tank/mielnet-compute016
mielnet-compute017 /dev/zvol/tank/mielnet-compute017
mielnet-compute018 /dev/zvol/tank/mielnet-compute018
mielnet-compute059 /dev/zvol/tank/mielnet-compute059

Targets:
Name Flags LUNs PG IG AG
iqn.2007-09.jp.ne.peach.istgt:mielnet-compute016 rw LUN0=/dev/zvol/tank/mielnet-compute016 1 1 1
iqn.2007-09.jp.ne.peach.istgt:mielnet-compute017 rw LUN0=/dev/zvol/tank/mielnet-compute017 1 3 3
iqn.2007-09.jp.ne.peach.istgt:mielnet-compute018 rw LUN0=/dev/zvol/tank/mielnet-compute018 1 4 4
iqn.2007-09.jp.ne.peach.istgt:mielnet-compute059 rw LUN0=/dev/zvol/tank/mielnet-compute059 1 2 2
Initiator Groups:

Tag Initiators Networks Comment
1 ALL 10.10.100.16/32 mielnet-compute016 Initiator Group
2 ALL 10.10.100.59/32 mielnet-compute059 Initiator Group
3 ALL 10.10.100.17/32 mielnet-compute017 Initiator Group
4 ALL 10.10.100.18/32 mielnet-compute018 Initiator Group

 

OS installation

I used standard Centos installer, using advanced “Storage” option. Note that installation wizard failed/stuck at Grub installation phase, at this point I’ve used  installer’s second console ALT+F2 to scp kernel and initrd image out to my  PXE server.

 

DHCP service

We need DHCP service in order to make it working. Just standard DHCP reservations for my blades and PXE server living at 10.10.100.57 address:

# cat /etc/dhcp/dhcpd.conf
#########################
deny unknown-clients;
authoritative;
option dhcp-max-message-size 2048;
use-host-decl-names on;
ddns-update-style none;
option domain-name "mielnet.pl";
option domain-name-servers 8.8.8.8, 8.8.4.4 ;
default-lease-time 86400;
max-lease-time 86400;
log-facility local7;
option time-servers ntp0.mielnet.pl,inti.mielnet.pl ;
option ntp-servers ntp0.mielnet.pl,inti.mielnet.pl ;
#########################

subnet 10.10.100.0 netmask 255.255.255.0 {
option routers 10.10.100.254 ;
next-server 10.10.100.57 ;
filename "pxelinux.0";
option tftp-server-name "10.10.100.57";

}
host mielnet-compute016 {hardware ethernet 00:24:81:cf:xx:xx;fixed-address mielnet-compute016;}
host mielnet-compute017 {hardware ethernet 00:24:81:cf:xx:yy;fixed-address mielnet-compute017;}
host mielnet-compute018 {hardware ethernet 00:24:81:cf:xx:xy;fixed-address mielnet-compute018;}
host mielnet-compute059 {hardware ethernet 00:0c:29:02:xx:yx;fixed-address mielnet-compute059;}

PXE booting

Command gethostip 10.10.100.16 will translate IP address into hexadecimal format. Then:

vim /var/lib/tftpboot/pxelinux.cfg/86977610

 

# cat 86977610
DEFAULT menu
PROMPT 0
MENU TITLE MIELNET IT Services || Boot Server
TIMEOUT 20
TOTALTIMEOUT 200
ONTIMEOUT Centos7-mielnet-compute016

LABEL Centos7-mielnet-compute016
MENU LABEL Centos7-mielnet-compute016
kernel /images/mielnet-compute016/vmlinuz-3.10.0-327.10.1.el7.x86_64 root=/dev/sda1 ro netroot=iscsi:mielnet-compute016:xxxxxxxx@10.10.100.51::3260::iqn.2007-09.jp.ne.peach.istgt:mielnet-compute016 rd.iscsi.initiator=iqn.1994-05.com.redhat:4b7c6d70242b vconsole.font=latarcyrheb-sun16 vconsole.keymap=uk LANG=en_GB.UTF-8  console=tty0 ip=enp2s0f0:dhcp  rhgb quiet
append initrd=/images/mielnet-compute016/initramfs-3.10.0-327.10.1.el7.x86_64.img

LABEL Centos7-mielnet-compute016-bridge
MENU LABEL Centos7-mielnet-compute016-bridge
kernel /images/mielnet-compute016/vmlinuz-3.10.0-327.10.1.el7.x86_64 root=/dev/sda1 ro netroot=iscsi:mielnet-compute016:xxxxxxxx@10.10.100.51::3260::iqn.2007-09.jp.ne.peach.istgt:mielnet-compute016 rd.iscsi.initiator=iqn.1994-05.com.redhat:4b7c6d70242b vconsole.font=latarcyrheb-sun16 vconsole.keymap=uk LANG=en_GB.UTF-8  bridge=br-ex:enp2s0f0 ip=br-ex:dhcp console=tty0 rd.shell rd.debug
append initrd=/images/mielnet-compute016/initramfs-3.10.0-327.10.1.el7.x86_64.img

LABEL Centos7-mielnet-compute016-rescue
MENU LABEL Centos7-mielnet-compute016-rescue
kernel /images/mielnet-compute016/vmlinuz-0-rescue-a8aafbe2565244fc8478818344af177d rescue vconsole.font=latarcyrheb-sun16 vconsole.keymap=uk LANG=en_GB.UTF-8 root=/dev/sda1 netroot=iscsi:mielnet-compute016:xxxxxxxxx@10.10.100.51::3260::iqn.2007-09.jp.ne.peach.istgt:mielnet-compute016 ip=enp2s0f0:dhcp rd.iscsi.initiator=iqn.1994-05.com.redhat:4b7c6d70242b
append initrd=/images/mielnet-compute016/initramfs-0-rescue-a8aafbe2565244fc8478818344af177d.img

MENU end

make sure to replace mielnet-compute016:xxxxxxxx with your iSCSI target unique CHAP auth.

Lastly make sure we have kernel and initrd.img in place:

 # ls -l /var/lib/tftpboot/images/mielnet-compute016/
total 172068
-rw-r--r--. 1 root root   126426 Nov 19  2015 config-3.10.0-327.el7.x86_64
drwxr-xr-x. 2 root root       26 Mar 16 17:19 grub
drwx------. 3 root root       19 Mar 16 17:20 grub2
-rw-r--r--. 1 root root 41572738 Mar 16 17:21 initramfs-0-rescue-a8aafbe2565244fc8478818344af177d.img
-rw-r--r--. 1 root root 20945730 Mar 23 14:20 initramfs-3.10.0-327.10.1.el7.x86_64.img
-rw-r--r--. 1 root root 21417384 Mar 16 17:21 initramfs-3.10.0-327.el7.x86_64.img
-rw-r--r--. 1 root root 20945730 Mar 23 14:49 initramfs.img
-rw-r--r--. 1 root root 41572738 Mar 16 17:21 initramfs-rescue.img
-rw-r--r--. 1 root root   602670 Mar 16 17:20 initrd-plymouth.img
-rw-r--r--. 1 root root   252612 Nov 19  2015 symvers-3.10.0-327.el7.x86_64.gz
-rw-------. 1 root root  2963044 Nov 19  2015 System.map-3.10.0-327.el7.x86_64
-rwxr-xr-x. 1 root root  5155536 Mar 23 14:50 vmlinuz
-rwxr-xr-x. 1 root root  5156528 Mar 16 17:22 vmlinuz-0-rescue-a8aafbe2565244fc8478818344af177d
-rwxr-xr-x. 1 root root  5155536 Feb 16  2016 vmlinuz-3.10.0-327.10.1.el7.x86_64
-rwxr-xr-x. 1 root root  5156528 Nov 19  2015 vmlinuz-3.10.0-327.el7.x86_64
-rwxr-xr-x. 1 root root  5156528 Mar 16 17:22 vmlinuz-rescue

That should get you going. The only downside I can see, after upgrading Linux kernel you need to manually copy new kernel/initrd to PXE server and then change kernel filename in PXE config file manually. Fortunately, with Centos it doesn’t happen that often so I can live with that.

Apart of that, been running these Blades as compute nodes like that for a few months now with zero problems so far.

Feb 162016
 

Intro

Pistyll Rhaeadr

I needed a job scheduling system for a single machine, to allow group of people run some number crunching scripts. Decided to try SLURM and was surprised that there are no rpm repo/packages available for Centos – sadly that ain’t as easy as apt-get install slurm-llnl

But I managed to get it working in the end and here you can find a journal from this journey.

 

We need EPEL repo

rpm -Uvh http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

Installing required bits and bobs

yum install -y munge-devel munge-libs readline-devel perl-ExtUtils-MakeMaker openssl-devel pam-devel rpm-build perl-DBI perl-Switch munge mariadb-devel

Downloading the latest stable version of Slurm

From http://www.schedmd.com/#repos

Building rpm packages

rpmbuild -ta slurm-15.08.7.tar.bz2

Once done install rpms

ls -l ~/rpmbuild/RPMS/x86_64/*.rpm
rpm -Uvh ~/rpmbuild/RPMS/x86_64/*.rpm

or even better upload it to your custom Spacewalk software channel. Don’t you use Spacewalk server? Check it out, if you have more Centos boxes then you gonna love it, it’s awesome.

We may also add user for slurm at that stage, we are going to need it at later.

useradd slurm
mkdir /var/log/slurm
chown slurm. /var/log/slurm

Install MariaDB

yum install mariadb-server -y
systemctl start mariadb
systemctl enable mariadb
mysql_secure_installation

# you can save mysql root password in root home dir,
# bad practise but from the other hand
# if someone can access root home dir
# then we are in troubles anyway

vim ~/.my.cnf
[client]
password = aksjdlowjedjw34dwnknxpw93e9032edwxbsx
# now root will have mysql root password-less shell.

Create SQL database

Start mysql shell and

mysql> grant all on slurm_acct_db.* TO 'slurm'@'localhost'
-> identified by 'some_pass' with grant option;
mysql> create database slurm_acct_db;

Configure SLURM db backend

# egrep -v '^#|^$' /etc/slurm/slurmdbd.conf
AuthType=auth/munge
DbdAddr=localhost
DbdHost=localhost
SlurmUser=slurm
DebugLevel=4
LogFile=/var/log/slurm/slurmdbd.log
PidFile=/var/run/slurmdbd.pid
StorageType=accounting_storage/mysql
StorageHost=localhost
StoragePass=some_pass
StorageUser=slurm
StorageLoc=slurm_acct_db

and enable service

systemctl start slurmdbd
systemctl enable slurmdbd
systemctl status slurmdbd

After starting service your shiny new database should be populated with tables:

MariaDB [slurm_acct_db]> show tables;
+-------------------------+
| Tables_in_slurm_acct_db |
+-------------------------+
| acct_coord_table |
| acct_table |
| clus_res_table |
| cluster_table |
| qos_table |
| res_table |
| table_defs_table |
| tres_table |
| txn_table |
| user_table |
+-------------------------+
10 rows in set (0.01 sec)

 

Time to configure Munge auth daemon

create-munge-key
systemctl start munge
systemctl status munge
systemctl enable munge

And finally the actual SLURM daemon

Stick something alongside these lines to your /etc/slurm/slurm.conf

# egrep -v '^#|^$' /etc/slurm/slurm.conf
ClusterName=efg
ControlMachine=efg01
SlurmUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/home/slurm/tmp
SlurmdSpoolDir=/tmp/slurmd
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
Proctracktype=proctrack/linuxproc
CacheGroups=0
ReturnToService=0
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
SchedulerType=sched/backfill
SelectType=select/linear
FastSchedule=1
SlurmctldDebug=3
SlurmdDebug=3
JobCompType=jobcomp/none
JobAcctGatherType=jobacct_gather/linux
JobAcctGatherFrequency=30
AccountingStorageType=accounting_storage/slurmdbd
NodeName=efg01 CPUs=16 State=UNKNOWN
PartitionName=debug Nodes=efg01 Default=YES MaxTime=INFINITE State=UP

and see if your service can start

systemctl start slurm
systemctl status slurm
systemctl enable slurm

 

 

Testing SLURM

 

scontrol show daemons
srun --ntasks=16 --label /bin/hostname
sbatch # submit script
salloc # create job alloc and start shell, interactive
srun # create job alloc and launch job step, MPI
sattach #
sinfo
sinfo --Node
sinfo -p debug
squeue -i60
squeue -u dyzio -t all
squeue -s -p debug
smap
sview
scontrol show partition
scontrol update PartitionName=debug MaxTime=60
scontrol show config
sacct -u dyzio
sacct -p debug
sstat
sreport
sacctmgr
sprio
sshare
sdiag
scancel --user=dyzio --state=pending
scancel 444445
strigger
# Submit a job array with index values between 0 and 31
sbatch --array=0-31 -N1 tmp
# Submit a job array with index values of 1, 3, 5 and 7
sbatch --array=1,3,5,7 -N1 tmp
# Submit a job array with index values between 1 and 7
# with a step size of 2 (i.e. 1, 3, 5 and 7)
sbatch --array=1-7:2 -N1 tmp

 

 

Any troubles?

Checkout /var/log/messages /var/log/slurm/slurmdbd.log and output from

systemctl status slurm slurmdbd munge -l

That should get you started. Drop a comment below if it did.