Page Contents
- 1 Intro
- 2 We need EPEL repo
- 3 Installing required bits and bobs
- 4 Downloading the latest stable version of Slurm
- 5 Building rpm packages
- 6 Once done install rpms
- 7 Install MariaDB
- 8 Create SQL database
- 9 Configure SLURM db backend
- 10 Time to configure Munge auth daemon
- 11 And finally the actual SLURM daemon
- 12 Testing SLURM
- 13 Any troubles?
Intro
I needed a job scheduling system for a single machine, to allow group of people run some number crunching scripts. Decided to try SLURM and was surprised that there are no rpm repo/packages available for Centos – sadly that ain’t as easy as apt-get install slurm-llnl …
But I managed to get it working in the end and here you can find a journal from this journey.
We need EPEL repo
rpm -Uvh http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
Installing required bits and bobs
yum install -y munge-devel munge-libs readline-devel perl-ExtUtils-MakeMaker openssl-devel pam-devel rpm-build perl-DBI perl-Switch munge mariadb-devel
Downloading the latest stable version of Slurm
From http://www.schedmd.com/#repos
Building rpm packages
rpmbuild -ta slurm-15.08.7.tar.bz2
Once done install rpms
ls -l ~/rpmbuild/RPMS/x86_64/*.rpm rpm -Uvh ~/rpmbuild/RPMS/x86_64/*.rpm
or even better upload it to your custom Spacewalk software channel. Don’t you use Spacewalk server? Check it out, if you have more Centos boxes then you gonna love it, it’s awesome.
We may also add user for slurm at that stage, we are going to need it at later.
useradd slurm mkdir /var/log/slurm chown slurm. /var/log/slurm
Install MariaDB
yum install mariadb-server -y systemctl start mariadb systemctl enable mariadb mysql_secure_installation # you can save mysql root password in root home dir, # bad practise but from the other hand # if someone can access root home dir # then we are in troubles anyway vim ~/.my.cnf [client] password = aksjdlowjedjw34dwnknxpw93e9032edwxbsx # now root will have mysql root password-less shell.
Create SQL database
Start mysql shell and
mysql> grant all on slurm_acct_db.* TO 'slurm'@'localhost' -> identified by 'some_pass' with grant option; mysql> create database slurm_acct_db;
Configure SLURM db backend
# egrep -v '^#|^$' /etc/slurm/slurmdbd.conf AuthType=auth/munge DbdAddr=localhost DbdHost=localhost SlurmUser=slurm DebugLevel=4 LogFile=/var/log/slurm/slurmdbd.log PidFile=/var/run/slurmdbd.pid StorageType=accounting_storage/mysql StorageHost=localhost StoragePass=some_pass StorageUser=slurm StorageLoc=slurm_acct_db
and enable service
systemctl start slurmdbd systemctl enable slurmdbd systemctl status slurmdbd
After starting service your shiny new database should be populated with tables:
MariaDB [slurm_acct_db]> show tables; +-------------------------+ | Tables_in_slurm_acct_db | +-------------------------+ | acct_coord_table | | acct_table | | clus_res_table | | cluster_table | | qos_table | | res_table | | table_defs_table | | tres_table | | txn_table | | user_table | +-------------------------+ 10 rows in set (0.01 sec)
Time to configure Munge auth daemon
create-munge-key systemctl start munge systemctl status munge systemctl enable munge
And finally the actual SLURM daemon
Stick something alongside these lines to your /etc/slurm/slurm.conf
# egrep -v '^#|^$' /etc/slurm/slurm.conf ClusterName=efg ControlMachine=efg01 SlurmUser=slurm SlurmctldPort=6817 SlurmdPort=6818 AuthType=auth/munge StateSaveLocation=/home/slurm/tmp SlurmdSpoolDir=/tmp/slurmd SwitchType=switch/none MpiDefault=none SlurmctldPidFile=/var/run/slurmctld.pid SlurmdPidFile=/var/run/slurmd.pid Proctracktype=proctrack/linuxproc CacheGroups=0 ReturnToService=0 SlurmctldTimeout=300 SlurmdTimeout=300 InactiveLimit=0 MinJobAge=300 KillWait=30 Waittime=0 SchedulerType=sched/backfill SelectType=select/linear FastSchedule=1 SlurmctldDebug=3 SlurmdDebug=3 JobCompType=jobcomp/none JobAcctGatherType=jobacct_gather/linux JobAcctGatherFrequency=30 AccountingStorageType=accounting_storage/slurmdbd NodeName=efg01 CPUs=16 State=UNKNOWN PartitionName=debug Nodes=efg01 Default=YES MaxTime=INFINITE State=UP
and see if your service can start
systemctl start slurm systemctl status slurm systemctl enable slurm
Testing SLURM
scontrol show daemons srun --ntasks=16 --label /bin/hostname sbatch # submit script salloc # create job alloc and start shell, interactive srun # create job alloc and launch job step, MPI sattach # sinfo sinfo --Node sinfo -p debug squeue -i60 squeue -u dyzio -t all squeue -s -p debug smap sview scontrol show partition scontrol update PartitionName=debug MaxTime=60 scontrol show config sacct -u dyzio sacct -p debug sstat sreport sacctmgr sprio sshare sdiag scancel --user=dyzio --state=pending scancel 444445 strigger # Submit a job array with index values between 0 and 31 sbatch --array=0-31 -N1 tmp # Submit a job array with index values of 1, 3, 5 and 7 sbatch --array=1,3,5,7 -N1 tmp # Submit a job array with index values between 1 and 7 # with a step size of 2 (i.e. 1, 3, 5 and 7) sbatch --array=1-7:2 -N1 tmp
Any troubles?
Checkout /var/log/messages /var/log/slurm/slurmdbd.log and output from
systemctl status slurm slurmdbd munge -l
That should get you started. Drop a comment below if it did.
Hi All,
I start with munge start not running.
Error is showing:
systemctl start munge
job for munge service failed because of the control process exit with error code.
That would be a sign of missing munge keys, Did you create keys with “create-munge-key”? Have you looked at the output from journalctl –no-pager
Hi,
I’m trying to build the RPM package but I keep getting this error message
RPM build errors:
Bad exit status from /var/tmp/rpm-tmp.uHcn4P (%build)
Try installing GCC library
# yum install gcc
[root@inportal spool]# cat /etc/centos-release
CentOS Linux release 7.3.1611 (Core)
Installing slurm-16.05.10-2
Munge is up and running great. I cannot start the slurmctld. I get the error, ” fatal: _create_clustername_file: failed to create file /var/spool/clustername”
any thoughts?
My gut feeling is that it is SeLinux doing. Check out /var/log/audit/audit log.
Try switching off Selinux temporarily with “setenforce 0” and then starting slurmctld again.
If it is Selinux indeed then you can create selinux custom module.
Is it possible to build these rpms so they install in an alternate location (ex. /opt/slurm)?
Awesome walkthrough!
My question is this: if I want to establish a cluster, I’ll need several machines to share the slurm config.
1) Should I propagate slurm.conf manually or set it up on a shared folder (NAS NFS for example)?
2) is the munge key supposed to be the same everywhere?
Cheers
You can try openHPC project https://openhpc.community
1) I use bash script to distribute the config file – NFS is good idea for batch scripts
2) yes
Dear,
i am installed slurm version 17.11.13-2 on my centos 7 machine.for everything is fine.after the configuration of slurm.conf file,start the slurm and slurmd service.it shows the error. Failed to start slurm.service: Unit not found.,Failed to start Slurm node daemon. Kindly help how to solve the issue and how to run slurm after that slurm,conf file,
We want to slurm install so please tell us how to configure slurm anyone please send steps