0% found this document useful (0 votes)

148 views

Intro To Slurm

Slurm is an open source cluster scheduler that: 1) Tracks available resources on a cluster and collects users' job requests. 2) Assigns priorities to jobs and runs them on assigned compute nodes. 3) Groups compute nodes into logical partitions depending on hardware characteristics.

Uploaded by

cquinto

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

148 views

Intro To Slurm

Uploaded by

cquinto

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

SLURM

JOB SCHEDULER
WHAT IS SLURM?
Slurm is an open source cluster management and job scheduling system for Linux clusters.

1 Keeps track of available resources on the cluster

2 Collects users resources requests for jobs

A B C D

3 Assign priorities to jobs

4 Run jobs on assigned compute nodes

A D
B C

www.slurm.schedmd.com
PARTITIONS

Compute nodes are grouped into logical sets called partitions

depending on their hardware characteristics or function:

production
Standard CPU nodes
(default)
Standard CPU nodes for debug
debug
(fast allocation times)
maxwell Nodes with Nvidia Maxwell GPUs
pascal Nodes with Nvidia Pascal GPUs
mic Nodes with Intel Xeon Phi cards

Ask ACCRE if you would like to get

access to specific partitions.
JOB EXECUTION WORKFLOW

1. DETERMINE THE RESOURCES NECESSARY FOR THE SPECIFIC JOB

2. CREATE A BATCH JOB SCRIPT

3. SUBMIT THE JOB TO THE SCHEDULER

4. CHECK JOB STATUS

5. RETRIEVE JOB INFORMATION

i
DETERMINE RESOURCES FOR JOB

NUMBER OF CPU CORES

• From 1 to the maximum allowed for your group’s account.
• Default is one CPU core. Slurm will immediately kill your
job if your process exceeds the
requested amount of
AMOUNT OF MEMORY GB per node # nodes
resources.
• Up to 246 GB per node. 20 90
• Default is 1 GB per core. 44 45
58 55
120 344
246 44

TIME Slightly overestimate the

requested job resources, but
• Job duration on production can be set up to 14 days.
do not greatly overestimate to
• Default is 15 minutes.
i • DEBUG QUEUE: max 30 minutes avoid unnecessary long wait
times.
DETERMINE RESOURCES FOR JOB - BACKFILL
Backfill scheduling will start lower priority jobs if doing so does not delay the expected start time of
any higher priority job.

1
2 John
3 12 CPU cores
4 1 week
5
6 Mark
7 2 CPU cores
5 hours
8
9
10 Lucy
11 1 CPU core
7 hours
12

i 1 2 3 4 5 6 7 8 hours
DETERMINE RESOURCES FOR JOB - OPTIMIZATION

How to define the right amount of resources for my job?

Select a representative job

Optimized
Overestimate resources requested
resources

Run test job

Check actual resources utilization

Lower
Optimize resources request More
queue
research
wait time

i Run production jobs

CREATE A BATCH JOB SCRIPT
A batch job consists of a sequence of commands listed in a file with the purpose of being executed by
the OS as a single instruction.

SHEBANG
myjob.slurm
• Specify the script interpreter (Bash)
• Must be the first line! #!/bin/bash

#SBATCH --nodes=1 # Nodes

SLURM DIRECTIVES #SBATCH --ntasks=1
• Start with “#SBATCH”: #SBATCH --mem=1G
Parsed by Slurm but ignored by Bash.
• Can be separated by spaces. # Max job duration
• Comments between and after #SBATCH --time=1-06:30:00
directives are allowed. #SBATCH --job-name=myjob
#SBATCH --output=myjob.out
• Must be before actual commands!
# Just a comment
SCRIPT COMMANDS setpkgs -a python

i • Commands you want to execute on the

compute nodes.
./myprogram
CREATE A BATCH JOB SCRIPT - THE ESSENTIALS

--nodes=N
• Request N nodes to be allocated. (Default: N=1)

--ntasks=N
• Request N tasks to be allocated. (Default: N=1)
• Unless otherwise specified, one task maps to one CPU core.

--mem=NG
• Request N gigabytes of memory per node. (Default: N=1)

--time=d-hh:mm:ss
• Request d days, hh hours, mm minutes and ss seconds. (Default: 00:15:00)

--job-name=<string>
• Specify a name for the job allocation. (Default: batch file name)

--output=<file_name>

i •
•
Write the batch script’s standard output in the specified file.
If not specified the output will be saved in the file: slurm-<jobid>.out
CREATE A BATCH JOB SCRIPT - EMAIL NOTIFICATION

--mail-user=<address>
• Send email to address.
• It accepts multiple comma separated addresses.

--mail-type=<event>
• Define the events for which you want to be notified:

BEGIN Job begins

END Job ends
FAIL Job fails
ALL BEGIN+END+FAIL
TIME_LIMIT_50 Elapsed time reaches 50% of allocated time
TIME_LIMIT_80 Elapsed time reaches 80% of allocated time
TIME_LIMIT_90 Elapsed time reaches 90% of allocated time

i
SUBMIT JOB TO THE SCHEDULER

sbatch batch_file
• Submit batch_file to Slurm.
• If successful, it returns the job ID of the submitted job.

SUBMISSION PRIORITY WAIT ALLOCATION

AND
Job is added to A priority value Job waits in queue until: EXECUTION
the queue is assigned to the
1. Resources are
job.
available
2. There are no jobs
with higher priority
in queue

scancel jobid
• Cancel the job corresponding to the
i How do I remove a job from the queue?
given jobid from the queue.
SUBMIT JOB TO THE SCHEDULER

How is my job’s priority calculated?

FAIRSHARE AGE

Prioritizes jobs belonging to under-

FAIRSHARE AGE The longer the job waits in
queue, the larger its age
serviced accounts.
factor becomes.
It reflects:

1. The share of resources

contributed by your research
group.
JOB SIZE
JOB SIZE
2. The historical amount of
computing resources consumed Jobs requesting more
by your account. CPUs are favored.

i PRIORITY
CHECK JOB STATUS

squeue -u vunetid
• Show the queued jobs for user vunetid.

NODELIST (REASON)
• For running jobs shows the allocated nodes.
• For pending jobs shows the wait reason:
STATUS
Priority Other jobs in queue have higher priority.
R = Running
Resources Insufficient resources available on the cluster.
PD = Pending
Reached maximum number of allocated CPUs by
CA = Cancelled AssocGrpCpuLimit
all jobs belonging to the user’s account.
Reached maximum amount of allocated memory
AssocGrpMemLimit
by all jobs belonging to the user’s account.

i AssocGrpTimeLimit
Reached maximum amount of allocated time by
all jobs belonging to the user’s account.
RETRIEVE JOB INFORMATION

rtracejob jobid
• Print requested and utilized resources (and more) for the given jobid.

The used memory may not be

an exact value.
Take it with reservations.

i
JOB ARRAYS

Submit multiple similar jobs with a single job batch script.

To each job within the array is assigned a unique task ID. All jobs in a job array must
have the same resource
requirements.
--array=start-end[:step][%limit]

• Define task ID interval from start to end as unsigned integer values.

• The step between successive values can be set after colon sign.
• Set the limit to the number of simultaneously running jobs with “%”. The maximum array size is
• Individual task IDs can be specified as a comma separated values list. 30,000 jobs.

--array=0-7 0, 1, 2, 3, 4, 5, 6, 7

--array=1-13:3 1, 4, 7, 10, 13 Significantly shorter

submission times than
--array=2,3,6,15 2, 3, 6, 15 submitting jobs
individually.
JOB ARRAYS

How to select different input/output for each job in the array?

Use Slurm environment variable: SLURM_ARRAY_TASK_ID

The task ID for the specific job in the array.

EXECUTED COMMAND OUTPUT FILE

my_program file_1 job_1234567_task_1.out

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --time=00:05:00 my_program file_2 job_1234567_task_2.out
#SBATCH --job-name=job_array
#SBATCH --array=1-4
#SBATCH --output=job_%A_task_%a.out my_program file_3 job_1234567_task_3.out

my_program file_${SLURM_ARRAY_TASK_ID}
my_program file_4 job_1234567_task_4.out
JOB ARRAYS

What if my input files do not have a numerical index?

#!/bin/bash
#SBATCH
…
1 2 3 4
myfile=$( ls DataDir | awk -v line=${SLURM_ARRAY_TASK_ID} ‘{if (NR==line) print $0}’ )

5 my_program ${myfile}

1 Get the list of files names in the data directory in alphabetical order
2 Send the list to awk
3 Pass the value of the bash variable SLURM_ARRAY_TASK_ID to the awk variable “line”
4 Print only the NRth line in the list of files names for which NR corresponds to the job task ID
5 Pass the file name in the myfile variable to the main program
MULTITHREADED JOBS

POSIX THREADS

• Single task with multiple concurrent execution threads.

• Each thread uses a single CPU core.
• All threads share the same allocated memory.

Single node only!

1 node 1 node
1 task 2 tasks
8 CPUs per task 4 CPUs per task
MULTITHREADED JOBS

--cpus-per-task=N
• Request N CPU cores to be allocated for each task.

With OpenMP in your batch script don’t forget to set:

export OMP_NUM_THREADS = $SLURM_CPUS_PER_TASK
DISTRIBUTED MEMORY JOBS

MESSAGE PASSING INTERFACE (MPI)

• Multiple tasks with private memory allocations.

• Tasks exchange data through communications.
• Tasks can reside on the same node or on multiple nodes.

Single or multiple nodes

2 nodes
8 tasks per node
1 CPU per task
DISTRIBUTED MEMORY JOBS

--nodes=N
• Request N nodes to be allocated.

--tasks-per-node=N
• Request N tasks per node.
• Unless otherwise specified, one task maps to one CPU core.

In the batch script, run the MPI program with:

srun ./program_name

• Run MPI program called program_name.

Do not use mpirun or mpiexec! For OpenMPI only, add the

srun will use the correct following flag to srun:
launcher for the MPI library you
selected via setpkgs. srun --mpi=pmi2 ./program_name
INTERACTIVE SHELL JOB

salloc options
• Obtain job allocation with shell access.
• Accepts all the same options previously seen for sbatch.

Gateway

Compute node

Recommended for debugging and

benchmarking sessions.
TROUBLESHOOTING

Why is my job still pending? SlurmActive -m mem

• Show the overall cluster utilization.
• Count as available cores only the ones with at least mem
amount of memory (in GB). Default: 1GB

Check overall cluster utilization

Check you account’s resources use

Check your account limits

TROUBLESHOOTING

Why is my job still pending?

qSummary -g group
• Show the total number of jobs and CPU cores allocated
or waiting for allocation for the selected group.
Check overall cluster utilization

Check you account’s resources use

Check your account limits

TROUBLESHOOTING

Why is my job still pending?

showLimits -g group

• Show the cluster resources limits for a specific group.

Check overall cluster utilization

Check you account’s resources use

Check your account limits

Users in the same group share

the same amount of resources.
TROUBLESHOOTING

Why did my job fail?

Check with rtracejob:

1 A non-zero exit code means your

Check the job’s output
file for error messages. 2 application failed.

Check your Slurm batch job script for syntax

3 or logic errors.
www.accre.vanderbilt.edu/slurm
NEED MORE HELP?

Check our Frequently Asked Question webpage:

www.accre.vanderbilt.edu/faq

Submit a ticket from the helpdesk:

www.accre.vanderbilt.edu/help

Open a ticket to request an appointment

with an ACCRE specialist.

DO NOT submit tickets in “Rush cluster”!

Rush tickets are for cluster-wide issues only.

VN Video Editor
No ratings yet
VN Video Editor
34 pages
SFA7700 Hardware Installation and Configuration Guide For SFA OS 2.3.1.5
No ratings yet
SFA7700 Hardware Installation and Configuration Guide For SFA OS 2.3.1.5
59 pages
RP4VMs 5.3 KT
No ratings yet
RP4VMs 5.3 KT
44 pages
Dart Cheat Sheet
100% (1)
Dart Cheat Sheet
2 pages
NSX-T Data Center 3.2.3 Configuration - Maximums
100% (1)
NSX-T Data Center 3.2.3 Configuration - Maximums
19 pages
Top 30 Linux System Admin Interview Questions & Answers
No ratings yet
Top 30 Linux System Admin Interview Questions & Answers
16 pages
InfoBlox SNMP Enterprise MIB
No ratings yet
InfoBlox SNMP Enterprise MIB
20 pages
Doing More With Slurm Advanced Capabilities
No ratings yet
Doing More With Slurm Advanced Capabilities
31 pages
Cumulus Networks Data Center Cheat Sheet
No ratings yet
Cumulus Networks Data Center Cheat Sheet
10 pages
Red Hat Ceph Storage-1.2.3-Red Hat Ceph Architecture-En-US
No ratings yet
Red Hat Ceph Storage-1.2.3-Red Hat Ceph Architecture-En-US
24 pages
Dell NSS NFS Storage Solution Final PDF
No ratings yet
Dell NSS NFS Storage Solution Final PDF
38 pages
Cluster Admin Guide
No ratings yet
Cluster Admin Guide
41 pages
Cumulus Linux 3.7.2 User Guide
No ratings yet
Cumulus Linux 3.7.2 User Guide
1,113 pages
Red Hat Ceph Storage 5 Installation Guide en Us
No ratings yet
Red Hat Ceph Storage 5 Installation Guide en Us
111 pages
Linux NFS
100% (1)
Linux NFS
11 pages
Pacemaker - Quick - Command - Reference 1
No ratings yet
Pacemaker - Quick - Command - Reference 1
6 pages
Infinibad Cheat Sheet
No ratings yet
Infinibad Cheat Sheet
2 pages
Powerha Systemmirror For Aix V7.1 Two-Node Quick Configuration Guide
No ratings yet
Powerha Systemmirror For Aix V7.1 Two-Node Quick Configuration Guide
34 pages
Aix Quick Sheet
No ratings yet
Aix Quick Sheet
2 pages
Red Hat Ceph Storage-5-File System Guide-En-Us
No ratings yet
Red Hat Ceph Storage-5-File System Guide-En-Us
160 pages
Scale Adm
No ratings yet
Scale Adm
808 pages
Improving Performance of 100G Data Transfer Nodes PDF
No ratings yet
Improving Performance of 100G Data Transfer Nodes PDF
48 pages
Proxmox Pricing
No ratings yet
Proxmox Pricing
2 pages
AWS Amazon VPC Connectivity Options PDF
No ratings yet
AWS Amazon VPC Connectivity Options PDF
31 pages
Red Hat Satellite 6.2 ArchitectureGuide
100% (1)
Red Hat Satellite 6.2 ArchitectureGuide
35 pages
Openshift Lab
No ratings yet
Openshift Lab
23 pages
Linux HPC Cluster Setup Guide
No ratings yet
Linux HPC Cluster Setup Guide
28 pages
Unit 2 - Cumulus Linux Initial Setup
No ratings yet
Unit 2 - Cumulus Linux Initial Setup
19 pages
Spacewalk Step by Step
No ratings yet
Spacewalk Step by Step
34 pages
Storage Tiering and Erasure Coding in Ceph - 150222
No ratings yet
Storage Tiering and Erasure Coding in Ceph - 150222
79 pages
Gpfs Command
No ratings yet
Gpfs Command
6 pages
Proxy Server: For Wikipedia's Policy On Editing From Open Proxies, Please See
No ratings yet
Proxy Server: For Wikipedia's Policy On Editing From Open Proxies, Please See
11 pages
OpenShift Container Platform 3.5 Administrator Solutions en US
No ratings yet
OpenShift Container Platform 3.5 Administrator Solutions en US
55 pages
BRKETI-2003 - Understanding Multicluster Kubernetes Connectivity Options
No ratings yet
BRKETI-2003 - Understanding Multicluster Kubernetes Connectivity Options
73 pages
CCONP Study Guide 02.4
100% (1)
CCONP Study Guide 02.4
159 pages
Dell Emc Smartfabric Services User Guide: Release 1.0
No ratings yet
Dell Emc Smartfabric Services User Guide: Release 1.0
60 pages
Screen and Tmux Cheat Sheet
No ratings yet
Screen and Tmux Cheat Sheet
2 pages
Cumulus Linux Quick Reference Guide For NX-OS Users
100% (1)
Cumulus Linux Quick Reference Guide For NX-OS Users
29 pages
Configure High Availability Cluster in Centos 7 (Step by Step Guide)
No ratings yet
Configure High Availability Cluster in Centos 7 (Step by Step Guide)
9 pages
Python Getting Started Guide
No ratings yet
Python Getting Started Guide
48 pages
SUSE HA Arch Overview
No ratings yet
SUSE HA Arch Overview
26 pages
OpenStack Cheat Sheet 1
No ratings yet
OpenStack Cheat Sheet 1
3 pages
Red Hat Enterprise Virtualization 3.1 V2V Guide en US
No ratings yet
Red Hat Enterprise Virtualization 3.1 V2V Guide en US
63 pages
IBM System Storage DS8000 Architecture and Implementation
No ratings yet
IBM System Storage DS8000 Architecture and Implementation
656 pages
Ceph An Overview
No ratings yet
Ceph An Overview
8 pages
Tca Userguide
No ratings yet
Tca Userguide
505 pages
Professional VMware Application Modernization 2V0-71.21 Dumps
No ratings yet
Professional VMware Application Modernization 2V0-71.21 Dumps
11 pages
Openshift Origin (3.11) Installation On Centos7
No ratings yet
Openshift Origin (3.11) Installation On Centos7
3 pages
Pcs Command Reference
No ratings yet
Pcs Command Reference
4 pages
Using EMC VNX Storage With VMware Vsphere
No ratings yet
Using EMC VNX Storage With VMware Vsphere
284 pages
Linux Academy Samba 4 1C30C
No ratings yet
Linux Academy Samba 4 1C30C
2 pages
Dell R730xd RedHat Ceph Performance SizingGuide WhitePaper
No ratings yet
Dell R730xd RedHat Ceph Performance SizingGuide WhitePaper
37 pages
LinuxCBT Moni-Zab Edition Classroom Notes
No ratings yet
LinuxCBT Moni-Zab Edition Classroom Notes
5 pages
Pve Admin Guide 8.2
No ratings yet
Pve Admin Guide 8.2
613 pages
Ceph Reference Architecture
100% (1)
Ceph Reference Architecture
12 pages
Introduction To Ansible: Raghu
No ratings yet
Introduction To Ansible: Raghu
15 pages
Get Started With Red Hat Ansible Tower
No ratings yet
Get Started With Red Hat Ansible Tower
26 pages
Red Hat Gluster Storage 3.1 Deployment Guide For Containerized Red Hat Gluster Storage in Openshift Enterprise
No ratings yet
Red Hat Gluster Storage 3.1 Deployment Guide For Containerized Red Hat Gluster Storage in Openshift Enterprise
40 pages
DevOps Roadmap
No ratings yet
DevOps Roadmap
2 pages
Red Hat Enterprise Linux 6 Deployment Guide en US
100% (1)
Red Hat Enterprise Linux 6 Deployment Guide en US
735 pages
Terraform Commands
No ratings yet
Terraform Commands
5 pages
Rocky Linux 9 Essentials: Learn to Install, Administer, and Deploy Rocky Linux 9 Systems
From Everand
Rocky Linux 9 Essentials: Learn to Install, Administer, and Deploy Rocky Linux 9 Systems
Neil Smyth
No ratings yet
VMware Horizon View Essentials
From Everand
VMware Horizon View Essentials
Peter von Oven
No ratings yet
Godly tournament
No ratings yet
Godly tournament
7 pages
La teoría tras la práctica. Heifetz (Cap 2)
No ratings yet
La teoría tras la práctica. Heifetz (Cap 2)
17 pages
Demo-Rocks-installation
No ratings yet
Demo-Rocks-installation
14 pages
Tips and Tricks For Diagnosing Lustre Problems On Cray Systems
No ratings yet
Tips and Tricks For Diagnosing Lustre Problems On Cray Systems
15 pages
Linux Clusters Institute: Scheduling
No ratings yet
Linux Clusters Institute: Scheduling
93 pages
Preparing Your Computer For LFS101x
No ratings yet
Preparing Your Computer For LFS101x
16 pages
Datasheet Sequencing Multiplex
No ratings yet
Datasheet Sequencing Multiplex
4 pages
Homer: Mapping Reads To The Genome
No ratings yet
Homer: Mapping Reads To The Genome
5 pages
Newick Utilities Tutorial: Polio1A CO XA18
No ratings yet
Newick Utilities Tutorial: Polio1A CO XA18
109 pages
Version 3.0.2 July 2017: BUSCO: Assessing Genome Assembly and Annotation Completeness With Single-Copy Orthologs
No ratings yet
Version 3.0.2 July 2017: BUSCO: Assessing Genome Assembly and Annotation Completeness With Single-Copy Orthologs
21 pages
All-Food-Seq (AFS) : A Quantifiable Screen For Species in Biological Samples by Deep DNA Sequencing
No ratings yet
All-Food-Seq (AFS) : A Quantifiable Screen For Species in Biological Samples by Deep DNA Sequencing
12 pages
Forms Symfony2
No ratings yet
Forms Symfony2
45 pages
BC Syllabus
No ratings yet
BC Syllabus
1 page
Main Idea in Passage
100% (1)
Main Idea in Passage
4 pages
Table of Contents
No ratings yet
Table of Contents
19 pages
29 Savin Sebastian
No ratings yet
29 Savin Sebastian
8 pages
Msi MS-7149 - Rev 0a
No ratings yet
Msi MS-7149 - Rev 0a
30 pages
Exploring Gender Differences in The Use of Discourse Markers in Iranian Academic Research Articles
No ratings yet
Exploring Gender Differences in The Use of Discourse Markers in Iranian Academic Research Articles
6 pages
Communicative Language Teaching
No ratings yet
Communicative Language Teaching
21 pages
Dream Interpretation in The Ramesside Ag
No ratings yet
Dream Interpretation in The Ramesside Ag
10 pages
MS440209PM 48G6+ - 207PM 48G6
No ratings yet
MS440209PM 48G6+ - 207PM 48G6
8 pages
Do Sabellianism or Arianism Significantly Impact Soteriology
No ratings yet
Do Sabellianism or Arianism Significantly Impact Soteriology
11 pages
Grade 3 At-A-Glance: Health and Life Skills Physical Education English Language Arts
No ratings yet
Grade 3 At-A-Glance: Health and Life Skills Physical Education English Language Arts
2 pages
Unit 2 - Fluid Mechanics
No ratings yet
Unit 2 - Fluid Mechanics
42 pages
Speech On Longening of Recess Time.
No ratings yet
Speech On Longening of Recess Time.
5 pages
Subject: Business Communication Chapter: Basics of Written Communication
No ratings yet
Subject: Business Communication Chapter: Basics of Written Communication
6 pages
Data Structures (Binary Search Tree) : G.P. Biswas/CSE Prof./IIT, Dhanbad
No ratings yet
Data Structures (Binary Search Tree) : G.P. Biswas/CSE Prof./IIT, Dhanbad
70 pages
How Do Search Engines Work
No ratings yet
How Do Search Engines Work
3 pages
Present Simple WH Questions Interactive Worksheet
No ratings yet
Present Simple WH Questions Interactive Worksheet
1 page
Activity Resources For Youth and Young Adults: Page 1 of 3
No ratings yet
Activity Resources For Youth and Young Adults: Page 1 of 3
3 pages
Lab 11
No ratings yet
Lab 11
4 pages
MATHS – WS - 10 - handlig_data
No ratings yet
MATHS – WS - 10 - handlig_data
4 pages
Arabic Nobiin
No ratings yet
Arabic Nobiin
20 pages
Module On Expanded Definitions of Words Fourth Quarter
No ratings yet
Module On Expanded Definitions of Words Fourth Quarter
9 pages
Atari 800XL Service Manual
No ratings yet
Atari 800XL Service Manual
41 pages
Identifying Object Relationships, Attributes and Methods
No ratings yet
Identifying Object Relationships, Attributes and Methods
17 pages
SRS Project 2
No ratings yet
SRS Project 2
17 pages
Hope in Hopelessness
No ratings yet
Hope in Hopelessness
3 pages
KEDOSHIM VeAhavta LReaacha Kamocha
No ratings yet
KEDOSHIM VeAhavta LReaacha Kamocha
2 pages
Olog
No ratings yet
Olog
3 pages

Intro To Slurm

Uploaded by

Intro To Slurm

Uploaded by

SLURM

1 Keeps track of available resources on the cluster

2 Collects users resources requests for jobs

3 Assign priorities to jobs

4 Run jobs on assigned compute nodes

Compute nodes are grouped into logical sets called partitions

Ask ACCRE if you would like to get

1. DETERMINE THE RESOURCES NECESSARY FOR THE SPECIFIC JOB

2. CREATE A BATCH JOB SCRIPT

3. SUBMIT THE JOB TO THE SCHEDULER

4. CHECK JOB STATUS

5. RETRIEVE JOB INFORMATION

NUMBER OF CPU CORES

TIME Slightly overestimate the

How to define the right amount of resources for my job?

Select a representative job

Run test job

Check actual resources utilization

i Run production jobs

#SBATCH --nodes=1 # Nodes

i • Commands you want to execute on the

BEGIN Job begins

SUBMISSION PRIORITY WAIT ALLOCATION

How is my job’s priority calculated?

Prioritizes jobs belonging to under-

1. The share of resources

The used memory may not be

Submit multiple similar jobs with a single job batch script.

• Define task ID interval from start to end as unsigned integer values.

--array=1-13:3 1, 4, 7, 10, 13 Significantly shorter

How to select different input/output for each job in the array?

Use Slurm environment variable: SLURM_ARRAY_TASK_ID

EXECUTED COMMAND OUTPUT FILE

my_program file_1 job_1234567_task_1.out

What if my input files do not have a numerical index?

• Single task with multiple concurrent execution threads.

Single node only!

With OpenMP in your batch script don’t forget to set:

MESSAGE PASSING INTERFACE (MPI)

• Multiple tasks with private memory allocations.

Single or multiple nodes

In the batch script, run the MPI program with:

• Run MPI program called program_name.

Do not use mpirun or mpiexec! For OpenMPI only, add the

Recommended for debugging and

Why is my job still pending? SlurmActive -m mem

Check overall cluster utilization

Check you account’s resources use

Check your account limits

Why is my job still pending?

Check you account’s resources use

Check your account limits

Why is my job still pending?

• Show the cluster resources limits for a specific group.

Check you account’s resources use

Check your account limits

Users in the same group share

Why did my job fail?

1 A non-zero exit code means your

Check your Slurm batch job script for syntax

Check our Frequently Asked Question webpage:

Submit a ticket from the helpdesk:

Open a ticket to request an appointment

DO NOT submit tickets in “Rush cluster”!

You might also like