Slurm 18.08 Overview
Slurm 18.08 Overview
08 Overview
Brian Christiansen
SchedMD
● Heterogeneous environments
● Burst buffer enhancements
● Fault tolerance
● Cloud computing
● Queue stuffing
● New TRes
● New TRes reporting options
● And more...
Copyright 2018 SchedMD LLC
http://www.schedmd.com
Heterogeneous Job Steps (MPI)
● ResumeFailProgram
○ The program that will be executed when nodes fail to resume by
ResumeTimeout. The argument to the program will be the names of the
failed nodes (using Slurm's hostlist expression format).
● New TRES
○ fs/disk, fs/lustre, ic/ofed, vmem, pages
● New default TRES
○ cpu, mem, energy, node, billing, fs/disk, vmem, pages
● AcctGather{FileSystem|Infiniband}Type
○ Not just for profiling anymore.
○ Must define fs/lustre and/or ic/ofed in AccountingStorageTRES
● TresUsage{In|Out}{Ave|Min|Max|Tot}
○ NOTE: When using with Ave[RSS|VM]Size or their values in
TRESUsageIn[Ave|Tot], they represent the average/total of the highest
watermarks over all ranks in the step. When using sstat they represent
the average/total at the moment the command was ran.
○ NOTE: TRESUsage*Min* values represent the lowest high water mark in
the step.
● TresUsage{In|Out}{MinNode|MinTask|MaxNode|MaxTask}
○ Node/task that reached min/max usage
Copyright 2018 SchedMD LLC
http://www.schedmd.com
Other, for users