Running Fluent Using A Load Manager
Running Fluent Using A Load Manager
ANSYS, Ansys Workbench, AUTODYN, CFX, FLUENT and any and all ANSYS, Inc. brand, product, service and feature
names, logos and slogans are registered trademarks or trademarks of ANSYS, Inc. or its subsidiaries located in the
United States or other countries. ICEM CFD is a trademark used by ANSYS, Inc. under license. CFX is a trademark
of Sony Corporation in Japan. All other brand, product, service and feature names or trademarks are the property
of their respective owners. FLEXlm and FLEXnet are trademarks of Flexera Software LLC.
Disclaimer Notice
THIS ANSYS SOFTWARE PRODUCT AND PROGRAM DOCUMENTATION INCLUDE TRADE SECRETS AND ARE CONFID-
ENTIAL AND PROPRIETARY PRODUCTS OF ANSYS, INC., ITS SUBSIDIARIES, OR LICENSORS. The software products
and documentation are furnished by ANSYS, Inc., its subsidiaries, or affiliates under a software license agreement
that contains provisions concerning non-disclosure, copying, length and nature of use, compliance with exporting
laws, warranties, disclaimers, limitations of liability, and remedies, and other provisions. The software products
and documentation may be used, disclosed, transferred, or copied only in accordance with the terms and conditions
of that software license agreement.
ANSYS, Inc. and ANSYS Europe, Ltd. are UL registered ISO 9001: 2015 companies.
For U.S. Government users, except as specifically granted by the ANSYS, Inc. software license agreement, the use,
duplication, or disclosure by the United States Government is subject to restrictions stated in the ANSYS, Inc.
software license agreement and FAR 12.212 (for non-DOD licenses).
Third-Party Software
See the legal information in the product help files for the complete Legal Notice for ANSYS proprietary software
and third-party software. If you are unable to access the Legal Notice, contact ANSYS, Inc.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. iii
Running Fluent Using a Load Manager
1. Introduction .................................................................................................................................... 49
1.1. Overview of Fluent and SGE Integration .................................................................................... 49
1.1.1. Requirements .................................................................................................................. 49
1.1.2. Fluent and SGE Communication ....................................................................................... 50
1.1.3. Checkpointing Directories ................................................................................................ 50
1.1.4. Checkpointing Trigger Files .............................................................................................. 50
1.1.5. Default File Location ......................................................................................................... 50
2. Configuring SGE for Fluent ............................................................................................................. 51
2.1. General Configuration ............................................................................................................... 51
2.2. Checkpoint Configuration ......................................................................................................... 51
2.3. Configuring Parallel Environments ............................................................................................ 52
2.4. Default Request File .................................................................................................................. 53
3. Running a Fluent Simulation under SGE ......................................................................................... 55
3.1. Submitting a Fluent Job from the Command Line ..................................................................... 55
3.2. Submitting a Fluent Job Using Fluent Launcher ........................................................................ 57
3.3. Using Custom SGE Scripts ......................................................................................................... 60
4. Running Fluent Utility Scripts under SGE ....................................................................................... 61
4. Running Fluent Under Slurm ................................................................................................................ 63
About This Document .......................................................................................................................... lxv
1. Introduction .................................................................................................................................... 67
1.1. Requirements for Running Fluent Jobs with Slurm ..................................................................... 67
2. Running Fluent Simulation under Slurm ........................................................................................ 69
2.1. Using the Integrated Slurm Capability ....................................................................................... 69
2.1.1. Overview ......................................................................................................................... 69
2.1.2. Usage .............................................................................................................................. 69
2.1.2.1. Submitting a Fluent Job from the Command Line ..................................................... 70
2.1.2.2. Submitting a Fluent Job Using Fluent Launcher ........................................................ 71
2.1.3. Examples ......................................................................................................................... 74
2.1.4. Limitations ....................................................................................................................... 74
2.2. Using Your Own Supplied Job Script .......................................................................................... 74
2.3. Epilogue/Prologue .................................................................................................................... 75
2.4. Monitoring/Manipulating Jobs .................................................................................................. 75
2.4.1. Monitoring the Progress of a Job ...................................................................................... 75
2.4.2. Removing a Job from the Queue ...................................................................................... 75
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
iv of ANSYS, Inc. and its subsidiaries and affiliates.
List of Figures
4.1. The Scheduler Tab of Fluent Launcher (Linux Version) ............................................................................ 19
2.1. The Scheduler Tab of Fluent Launcher (Linux Version) ............................................................................ 38
3.1. The Scheduler Tab of Fluent Launcher (Linux Version) ........................................................................... 58
2.1. The Scheduler Tab of Fluent Launcher (Linux Version) ............................................................................ 72
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. v
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
vi of ANSYS, Inc. and its subsidiaries and affiliates.
Part 1: Running Fluent Under LSF
This document is made available via the Ansys, Inc. website for your convenience. Contact Platform
Computing Inc. (http://www.platform.com/) directly for support of their product.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. ix
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
x of ANSYS, Inc. and its subsidiaries and affiliates.
Chapter 1: Introduction
Platform Computing’s LSF software is a distributed computing resource management tool that you can
use with either the serial or the parallel version of Ansys Fluent. This document provides general inform-
ation about running Fluent under LSF, and is made available via the Ansys, Inc. website for your con-
venience. Contact Platform directly for support of their product.
Using LSF, Fluent simulations can take full advantage of LSF checkpointing (saving Fluent .cas and
.dat files) and migration features. LSF is also integrated with various MPI communication libraries for
distributed MPI processing, increasing the efficiency of the software and data processing.
Important:
Platform’s Standard Edition is the foundation for all LSF products, it offers users load sharing and batch
scheduling across distributed Linux and Windows computing environments. Platform’s LSF Standard
Edition provides the following functionality:
– prioritizes jobs
– provides limits on the number of running jobs and job resource consumption
– ensures that no job is lost when the entire network goes down
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 11
Introduction
1.1.1. Requirements
The versions of LSF that are supported with Fluent are listed with the other supported job schedulers
and queuing systems posted on the Platform Support section of the Ansys Website.
These files are available from Platform Computing or Ansys, Inc., and permit Fluent checkpointing
and restarting from within LSF.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
12 of ANSYS, Inc. and its subsidiaries and affiliates.
Chapter 2: Checkpointing and Restarting
LSF provides utilities to save (that is, checkpoint), and restart an application. The Fluent and LSF integ-
ration allows Fluent to take advantage of the checkpoint and restart features of LSF. At the end of each
iteration, Fluent looks for the existence of a checkpoint or checkpoint-exit file. If Fluent detects the
checkpoint file, it writes a case and data file, removes the checkpoint file, and continues iterating. If
Fluent detects the checkpoint-exit file, it writes a case file and data file, then exits. LSF’s bchkpnt
utility can be used to create the checkpoint and checkpoint-exit files, thereby forcing Fluent to checkpoint
itself, or checkpoint and terminate itself. In addition to writing a case file and data file, Fluent also creates
a simple journal file with instructions to read the checkpointed case file and data file, and continues
iterating. Fluent uses that journal file when restarted with LSF’s brestart utility. For more details on
checkpointing features and options within Fluent, see the Fluent User's Guide.
The greatest benefit of the checkpoint facilities occurs when it is used on an automatic basis. By starting
jobs with a periodic checkpoint, LSF automatically restarts any jobs that are lost due to host failure from
the last checkpoint. This facility can dramatically reduce lost compute time, and also avoids the task of
manually restarting failed jobs.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 13
Checkpointing and Restarting
periodically while running the job. Fluent does not perform any checkpointing unless it finds the LSF
trigger file in the job subdirectory. Fluent removes the trigger file after checkpointing the job.
Each time a job is restarted, it is assigned a new job ID, and a new job subdirectory is created in the
checkpoint directory. Files in the checkpoint directory are never deleted by LSF, but you may choose
to remove old files once the Fluent job is finished and the job history is no longer required.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
14 of ANSYS, Inc. and its subsidiaries and affiliates.
Chapter 3: Configuring LSF for Fluent
LSF provides special versions of echkpnt and erestart called echkpnt.fluent and ere-
start.fluent to allow checkpointing with Fluent. You must make sure LSF uses these files instead
of the standard versions.
• Copy the echkpnt.fluent and erestart.fluent files to the $LSF_SERVERDIR for each ar-
chitecture that is desired.
• When submitting the job from the command line, include the -a fluent parameter when specifying
the checkpoint information (see Submitting a Fluent Job from the Command Line (p. 17) for details).
Important:
Note that LSF includes an email notification utility that sends email notices to users
when an LSF job has been completed. If a user submits a batch job to LSF and the email
notification utility is enabled, LSF will distribute an email containing the output for the
particular LSF job. When a Fluent job is run under LSF with the -g option, the email
will also contain information from the Fluent console.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 15
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
16 of ANSYS, Inc. and its subsidiaries and affiliates.
Chapter 4: Working with Fluent Jobs
Information in this chapter is provided in the following sections:
4.1. Submitting a Fluent Job from the Command Line
4.2. Submitting a Fluent Job Using Fluent Launcher
4.3. Using Custom LSF Scripts
4.4. Manually Checkpointing Fluent Jobs
4.5. Restarting Fluent Jobs
4.6. Migrating Fluent Jobs
4.7. Coupling LSF Job Submissions and ANSYS Licensing
where
• <solver_version> specifies the dimensionality of the problem and the precision of the Fluent calculation
(for example, 3d, 2ddp).
• <Fluent_options> can be added to specify the startup option(s) for Fluent, including the options for
running Fluent in parallel. For more information, see the Fluent User's Guide.
• -scheduler=lsf is added to the Fluent command to specify that you are running under LSF. This
option causes Fluent to check for trigger files in the checkpoint directory if the environment variable
LSB_CHKPNT_DIR is set.
• -scheduler_list_queues lists all available queues. Note that Fluent will not launch when this
option is used.
• -scheduler_opt=<opt> enables an additional option <opt> that is relevant for LSF; see the LSF
documentation for details. Note that you can include multiple instances of this option when you
want to use more than one scheduler option.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 17
Working with Fluent Jobs
• -scheduler_nodeonly allows you to specify that Cortex and host processes are launched before
the job submission and that only the parallel node processes are submitted to the scheduler.
• -scheduler_stderr=<err-file> sets the name / directory of the scheduler standard error file to
<err-file>; by default it is saved as fluent.<PID>.e in the working directory, where <PID> is the
process ID of the top-level Fluent startup script.
• -scheduler_stdout=<out-file> sets the name / directory of the scheduler standard output file
to <out-file>; by default it is saved as fluent.<PID>.o in the working directory, where <PID> is
the process ID of the top-level Fluent startup script.
Note that tight integration between LSF and the MPI is enabled by default for Intel MPI (the default)
and Open MPI, except when the Cortex process is launched after the job submission (which is the default
when not using -scheduler_nodeonly) and is run outside of the scheduler environment by using
the -gui_machine or -gui_machine=<hostname> option.
Important:
You must have the DISPLAY environment variable properly defined, otherwise the graphical
user interface (GUI) will not operate correctly.
1. Open Fluent Launcher (Figure 4.1: The Scheduler Tab of Fluent Launcher (Linux Version) (p. 19)) by
entering fluent without any arguments in the Linux command line.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
18 of ANSYS, Inc. and its subsidiaries and affiliates.
Submitting a Fluent Job Using Fluent Launcher
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 19
Working with Fluent Jobs
c. To specify a job queue, enable the LSF Queue option and enter the queue name in the text
box.
d. Enable the Use Checkpointing option to utilize LSF checkpointing. By default, the checkpointing
directory will be the current working directory; you have the option of enabling Checkpointing
Directory and specifying a different directory, either by entering the name in the text box or by
browsing to it.
You can specify that the checkpointing is done automatically at a set time interval by enabling
the Automatic Checkpoint with Setting of Period option and entering the period (in minutes)
in the text box; otherwise, checkpointing will not occur unless you call the bchkpnt command.
e. Enable the Node Only option under Common Options to specify that Cortex and host processes
are launched before the job submission and that only the parallel node processes are submitted
to the scheduler.
f. If you experience poor graphics performance when using LSF, you may be able to improve per-
formance by changing the machine on which Cortex (the process that manages the graphical
user interface and graphics) is running. The Graphics Rendering Machine list provides the fol-
lowing options:
• Select First Allocated Node if you want Cortex to run on the same machine as that used for
compute node 0. This is not available if you have enabled the Node Only option.
• Select Current Machine if you want Cortex to run on the same machine used to start Fluent
Launcher.
• Select Specify Machine if you want Cortex to run on a specified machine, which you select
from the drop-down list below.
Note that tight integration between LSF and the MPI is enabled by default for Intel MPI (the
default) and Open MPI, except when the Cortex process is launched after the job submission
(which is the default when not using the Node Only option) and is run outside of the scheduler
environment by using Current Machine or Specify Machine.
4. Set up the other aspects of your Fluent simulation using the Fluent Launcher GUI items. For more
information, see the Fluent User's Guide.
Important:
You must have the DISPLAY environment variable properly defined, otherwise the graphical
user interface (GUI) will not operate correctly.
Note:
Submitting your Fluent job from the command line provides some options that are not
available in Fluent Launcher, such as specifying the scheduler job submission machine name
or setting the scheduler standard error file or standard output file. For details, see Submitting
a Fluent Job from the Command Line (p. 17).
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
20 of ANSYS, Inc. and its subsidiaries and affiliates.
Migrating Fluent Jobs
where
• <bchkpnt_options> are options for the job checkpointing. See the Platform LSF Reference guide for
a complete list with descriptions.
• -k is the regular option to the bchkpnt command, and specifies checkpoint and exit. The job will
be killed immediately after being checkpointed. When the job is restarted, it does not have to repeat
any operations.
• <job_ID> is the job ID of the Fluent job, which is used to specify which job to checkpoint.
where
• <bsub_options> are options for the job restart. See the Platform LSF Reference guide for a complete
list with descriptions.
• <checkpoint_directory> specifies the checkpoint directory where the job subdirectory is located.
• <job_ID> is the job ID of the Fluent job, and specifies which job to restart. At this point, the restarted
job is assigned a new job ID, and the new job ID is used for checkpointing. The job ID changes each
time the job is restarted.
where
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 21
Working with Fluent Jobs
• <bmig_options> are options for the job migration. See the Platform LSF Reference guide for a complete
list with descriptions.
• <job_ID> is the job ID of the Fluent job, and specifies which job to checkpoint and restart on the
migration target. At this point, the restarted job is assigned a new job ID, and the new job ID is used
for checkpointing. The job ID changes each time the job is restarted.
1. Copy the elim script from your ANSYS installation area to $LSF_SERVERDIR. The elim script is
located in the following directory:
<path>/ansys_inc/v231/fluent/fluent23.1.0/multiport/mpi_wrapper/bin/
where <path> is the directory in which you have installed Fluent (for example, /opt/apps/).
2. Edit the copy of the elim script to add your license server and license feature details. The following
is an example where acfd_fluent, acfd_par_proc, and anshpc are ANSYS solver and parallel
license features:
$ENV{'ANSYSLMD_LICENSE_FILE'} = "1055\@deva12"
my @features = qw(acfd_fluent acfd_par_proc anshpc);
3. Set the permissions to 755 and set root as the owner for the elim script.
4. Add all of your ANSYS solver license feature names and ANSYS parallel/HPC license feature names
under Resource section in the file lsf.shared, which is located in $LSF_ENVDIR in your
LSF installation area. The following is an example in which acfd, acfd_fluent, acfd_solver,
and acfd_fluent_solver are the ANSYS solver license feature names and anshpc, an-
shpc_pack, and acfd_par_proc are the ANSYS parallel/HPC license feature names.
acfd Numeric 20 N (available ANSYS Fluent Solver licenses)
acfd_fluent Numeric 20 N (available ANSYS Fluent Solver licenses)
acfd_solver Numeric 20 N (available ANSYS Fluent Solver licenses)
acfd_fluent_solver Numeric 20 N (available ANSYS Fluent Solver licenses)
anshpc Numeric 20 N (available ANSYS Fluent Parallel licenses)
anshpc_pack Numeric 20 N (available ANSYS Fluent Parallel licenses)
acfd_par_proc Numeric 20 N (available ANSYS Fluent Parallel licenses)
5. Add all of your ANSYS solver license feature names and ANSYS parallel/HPC license feature names
in the file lsf.cluster.<cluster_name> (where <cluster_name> is the name of the cluster), which
is located in $LSF_ENVDIR in your LSF installation area. The following is an example in which
acfd, acfd_fluent, acfd_solver, and acfd_fluent_solver are the ANSYS solver license
feature names and anshpc, anshpc_pack, and acfd_par_proc are the ANSYS parallel/HPC
license feature names.
# For LSF-ANSYS Licensing Coupling
Begin ResourceMap
RESOURCENAME LOCATION
acfd ([all])
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
22 of ANSYS, Inc. and its subsidiaries and affiliates.
Coupling LSF Job Submissions and ANSYS Licensing
acfd_fluent ([all])
acfd_solver ([all])
acfd_fluent_solver ([all])
anshpc ([all])
anshpc_pack ([all])
acfd_par_proc ([all])
End ResourceMap
6. Reconfigure the LSF daemons using the following commands, to specify that they reread their
configuration. Note that you need administrator privileges to implement these changes.
lsadmin reconfig
badmin reconfig
<serial_lic_name>>0
where <serial_lic_name> is the name of the serial ANSYS solver license feature name. Similarly,
<parallel_license_feature> has the following form:
<parallel_lic_name>=<N>
where <parallel_lic_name> is the name of the ANSYS parallel/HPC license feature name, and <N>
is the number of processes to use.
The previous descriptions are applicable when you have a single serial and/or parallel license feature.
If you have multiple serial and/or parallel license features, you must add additional <solver_license_fea-
ture> and/or <parallel_license_feature> entries, separating them with ||; additionally, you must
enclose all of the <solver_license_feature> entries in a single pair of parentheses. The following is
an example of submitting a Fluent job in which acfd and acfd_fluent are the ANSYS solver li-
cense feature names, anshpc and acfd_par_proc are the ANSYS parallel/HPC license feature
names, and the number of processes to use is 4:
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 23
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
24 of ANSYS, Inc. and its subsidiaries and affiliates.
Chapter 5: Fluent and LSF Examples
This chapter provides various examples of running Fluent and LSF.
5.1. Examples Without Checkpointing
5.2. Examples with Checkpointing
fluent 3d -scheduler=lsf
• Serial 3D Fluent batch job under LSF, which reads the journal file called journal_file
Important:
PAM is an extension of LSF that manages parallel processes by choosing the appropriate
compute nodes and launching child processes. When using Fluent on Linux, PAM is
not used to launch Fluent (so the JOB_STARTER argument of the LSF queue should
not be set). Instead, LSF will set an environment variable that contains a list of N hosts,
and Fluent will use this list to launch itself.
• Parallel 3D Fluent batch job under LSF, which uses 5 processes and reads the journal file called
journal_file
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 25
Fluent and LSF Examples
– In this example, the LSF -a fluent specification identifies which echkpnt/ erestart combin-
ation to use, /home/username is the checkpoint directory, and the duration between automatic
checkpoints is 60 minutes.
– bjobs -l <job_ID>
→ This command returns the job information about <job_ID> in the LSF system.
– bchkpnt <job_ID>
→ This command forces Fluent to write a case file, a data file, and a restart journal file at the end
of its current iteration.
– bchkpnt -k <job_ID>
→ This command forces Fluent to write a case file, a data file, and a restart journal file at the end
of its current iteration.
→ The files are saved in a directory named <checkpoint_directory>/<job_ID> and then Fluent exits.
The <checkpoint_directory> is defined through the -scheduler_opt= option in the original
fluent command.
→ This command starts a Fluent job using the latest case and data files in the <checkpoint_direct-
ory>/<job_ID> directory.
• Parallel 3D Fluent batch job under LSF with checkpoint/restart, which specifies /home/username
as the checkpoint directory, uses 4 processes, and reads a journal file called journal_file
– bjobs -l <job_ID>
→ This command returns the job information about <job_ID> in the LSF system.
– bchkpnt <job_ID>
→ This command forces parallel Fluent to write a case file, a data file, and a restart journal file at
the end of its current iteration.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
26 of ANSYS, Inc. and its subsidiaries and affiliates.
Examples with Checkpointing
– bchkpnt -k <job_ID>
→ This command forces parallel Fluent to write a case file, a data file, and a restart journal file at
the end of its current iteration.
→ This command starts a Fluent network parallel job using the latest case and data files in the
<checkpoint_directory>/<job_ID> directory.
→ The parallel job will be restarted using the same number of processes as that specified through
the -t<x> option in the original fluent command (4 in the previous example).
– bmig -m <host> 0
→ This command checkpoints all jobs (indicated by 0 job ID) for the current user and moves them
to host <host>.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 27
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
28 of ANSYS, Inc. and its subsidiaries and affiliates.
Part 2: Running Fluent Under PBS Professional
This document is made available via the Ansys, Inc. website for your convenience. Contact Altair Engin-
eering, Inc. (http://www.altair.com/) directly for support of their product.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. xxxi
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
xxxii of ANSYS, Inc. and its subsidiaries and affiliates.
Chapter 1: Introduction
Altair PBS Professional is an open workload management tool for local and distributed environments.
You can use PBS Professional when running Fluent simulations, and thereby control the number of jobs
running and dynamically monitor the licenses. This document provides general information about
running Fluent under PBS Professional, and is made available via the Ansys, Inc. website for your con-
venience. Contact Altair Engineering, Inc. (http://www.altair.com/) directly for support of their product.
1.1.1. Requirements
• Standard
– The versions of PBS Professional that are supported with Fluent are listed with the other supported
job schedulers and queuing systems posted on the Platform Support section of the Ansys Website.
– Fluent 2023 R1
Important:
• Optional
– Checkpoint restart scripts must be obtained/written to use the checkpoint restart functionality
• Dynamic resource functionality provides a way to monitor the availability of software licenses. Refer
to the PBS Professional Administrators Guide for more information on creating custom resources.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 33
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
34 of ANSYS, Inc. and its subsidiaries and affiliates.
Chapter 2: Running Fluent Simulation under PBS
Professional
For more information, see the following sections:
2.1. Using the Integrated PBS Professional Capability
2.2. Using Your Own Supplied Job Script
2.3. Using Altair’s Sample Script
2.4. Monitoring/Manipulating Jobs
2.1.1. Overview
One option of using PBS Professional with Fluent is to use the PBS Professional launching capability
integrated directly into Fluent. In this mode, Fluent is started from the command line with the addi-
tional -scheduler=pbs argument. Fluent then takes responsibility for relaunching itself under PBS
Professional. This has the following advantages:
• The command line usage is very similar to the non-RMS (resource management system) usage.
The integrated PBS Professional capability is intended to simplify usage for the most common situations.
If you desire more control over the PBS Professional qsub options for more complex situations or
systems (or if you are using an older version of Fluent), you can always write or adapt a PBS Profes-
sional script that starts Fluent in the desired manner (see Using Your Own Supplied Job Script (p. 41)
and Using Altair’s Sample Script (p. 41) for details).
2.1.2. Usage
Information on usage is provided in the following sections:
2.1.2.1. Submitting a Fluent Job from the Command Line
2.1.2.2. Submitting a Fluent Job Using Fluent Launcher
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 35
Running Fluent Simulation under PBS Professional
where
• <solver_version> specifies the dimensionality of the problem and the precision of the Fluent
calculation (for example, 3d, 2ddp).
• <Fluent_options> can be added to specify the startup option(s) for Fluent, including the options
for running Fluent in parallel. For more information, see the Fluent User's Guide.
• -scheduler=pbs is added to the Fluent command to specify that you are running under PBS
Professional.
• -scheduler_list_queues lists all available queues. Note that Fluent will not launch when
this option is used.
• -scheduler_opt=<opt> enables an additional option <opt> that is relevant for PBS Profes-
sional; see the PBS Professional documentation for details. Note that you can include multiple
instances of this option when you want to use more than one scheduler option.
• -scheduler_nodeonly allows you to specify that Cortex and host processes are launched
before the job submission and that only the parallel node processes are submitted to the
scheduler.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
36 of ANSYS, Inc. and its subsidiaries and affiliates.
Using the Integrated PBS Professional Capability
• -scheduler_stderr=<err-file> sets the name / directory of the scheduler standard error file
to <err-file>; by default it is saved as fluent.<PID>.e in the working directory, where <PID>
is the process ID of the top-level Fluent startup script.
This syntax will start the Fluent job under PBS Professional using the qsub command in a batch
manner. When resources are available, PBS Professional will start the job and return a job ID, usually
in the form of <job_ID>.<hostname>. This job ID can then be used to query, control, or stop the
job using standard PBS Professional commands, such as qstat or qdel. The job will be run out
of the current working directory.
Important:
You must have the DISPLAY environment variable properly defined, otherwise the
graphical user interface (GUI) will not operate correctly.
1. Open Fluent Launcher (Figure 2.1: The Scheduler Tab of Fluent Launcher (Linux Version) (p. 38))
by entering fluent without any arguments in the Linux command line.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 37
Running Fluent Simulation under PBS Professional
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
38 of ANSYS, Inc. and its subsidiaries and affiliates.
Using the Integrated PBS Professional Capability
c. You can choose to make a selection from the PBS Submission Host drop-down list to specify
the PBS Pro submission host name for submitting the job, if the machine you are using to
run the launcher cannot submit jobs to PBS Pro.
d. You can choose to make a selection from the PBS Queue drop-down list to specify the
queue name.
• Enable the Node Only option to specify that Cortex and host processes are launched
before the job submission and that only the parallel node processes are submitted to the
scheduler.
f. If you experience poor graphics performance when using PBS Professional, you may be able
to improve performance by changing the machine on which Cortex (the process that manages
the graphical user interface and graphics) is running. The Graphics Rendering Machine list
provides the following options:
• Select First Allocated Node if you want Cortex to run on the same machine as that used
for compute node 0. Note that this is not available if you have enabled the Node Only
option.
• Select Current Machine if you want Cortex to run on the same machine used to start
Fluent Launcher.
• Select Specify Machine if you want Cortex to run on a specified machine, which you select
from the drop-down list below.
Note that if you enable the Tight Coupling option, it will not be used if the Cortex process
is launched after the job submission (which is the default when not using the Node Only
option) and is run outside of the scheduler environment by using Current Machine or
Specify Machine.
4. Set up the other aspects of your Fluent simulation using the Fluent Launcher GUI items. For
more information, see the Fluent User's Guide.
Important:
You must have the DISPLAY environment variable properly defined, otherwise the
graphical user interface (GUI) will not operate correctly.
Note:
Submitting your Fluent job from the command line provides some options that are not
available in Fluent Launcher, such as specifying the scheduler job submission machine
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 39
Running Fluent Simulation under PBS Professional
name or setting the scheduler standard error file or standard output file. For details, see
Submitting a Fluent Job from the Command Line (p. 36).
2.1.3. Examples
• Submit a parallel, 4-process job using a journal file fl5s3.jou:
> fluent 3d -t4 -i fl5s3.jou -scheduler=pbs
Relaunching fluent under PBSPro
134.les29
In the previous example, note that 134.les29 is returned from qsub. 134 is the job ID, while
les29 is the hostname on which the job was started.
les29:
Req’d Req’d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
------------- -------- -------- --------- ------ --- --- ------ ----- - -----
134.les29 user1 workq fluent 11958 4 4 -- -- R 00:00
Job run at Thu Jan 04 at 14:48
The first command in the previous example lists all of the jobs in the queue. The second command
lists the detailed status about the given job.
After the job is complete, the job will no longer show up in the output of the qstat command. The
results of the run will then be available in the scheduler standard output file.
2.1.4. Limitations
The integrated PBS Professional capability in Fluent 2023 R1 has the following limitations:
• The PBS Professional commands (such as qsub) must be in the users path.
• For parallel jobs, Fluent processes are placed on available compute resources using the qsub options
-l select=<N>. If you desire more sophisticated placement, you may write separate PBS Profes-
sional scripts as described in Using Your Own Supplied Job Script (p. 41).
• RMS-specific checkpointing is not available. PBS Professional only supports checkpointing via
mechanisms that are specific to the operating system on SGI and Cray systems. The integrated PBS
Professional capability is based on saving the process state, and is not based on the standard ap-
plication restart files (for example, Fluent case and data files) on which the LSF and SGE checkpoint-
ing is based. Thus, if you need to checkpoint, you should checkpoint your jobs by periodically
saving the Fluent data file via the journal file.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
40 of ANSYS, Inc. and its subsidiaries and affiliates.
Using Altair’s Sample Script
If you use custom PBS Professional scripts instead of relying on the standard Fluent option (either the
-scheduler=pbs option from the command line or the Use PBSPro option in Fluent Launcher, as
described in the preceding sections), your environment variables related to the job scheduler will not
be used unless you include the -scheduler_custom_script option with the Fluent options in
your script.
input
case
is the name of the .cas file that the input file will utilize.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 41
Running Fluent Simulation under PBS Professional
fluent_args
are extra Fluent arguments. As shown in the previous example, you can specify the interconnect
by using the -p<interconnect> command. The available interconnects include ethernet (the
default), infiniband, and crayx. The MPI is selected automatically, based on the specified
interconnect.
outfile
is the name of the file to which the standard output will be sent.
mpp="true"
will tell the job script to execute the job across multiple processors.
#We assume that if they didn’t specify arguments then they should use the
#config file if [ "xx${input}${case}${mpp}${fluent_args}zz" = "xxzz" ]; then
if [ -f pbs_fluent.conf ]; then
. pbs_fluent.conf
else
printf "No command line arguments specified, "
printf "and no configuration file found. Exiting \n"
fi
fi
#Set up the license information (Note: you need to substitute your own
#port and server in this command)
export ANSYSLMD_License_FILE
echo "---------- Going to start a fluent job with the following settings:
Input: $input
Case: $case
Output: $outfile
Fluent arguments: $fluent_args"
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
42 of ANSYS, Inc. and its subsidiaries and affiliates.
Monitoring/Manipulating Jobs
Note that for versions of Fluent prior to 12.0, the final line of the sample script should be changed
to the following:
usr/apps/Fluent.Inc/bin/fluent $fluent_args > $outfile
Note that the resources necessary for the job (that is, <resource_requests>) should be entered with
the proper syntax. For more information about requesting resources, see the PBS Professional 7.1
Users Guide Section 4.3.
2.3.4. Epilogue/Prologue
PBS Professional provides the ability to script some actions immediately before the job starts and
immediately after it ends (even if it is removed). The epilogue is a good place to put any best effort
cleanup and stray process functionality that might be necessary to clean up after a job. For instance,
one could include functionality to clean up the working/scratch area for a job if is deleted before its
completion. Fluent provides a way to get rid of job-related processes in the form of a shell script in
the work directory (cleanup-fluent-<host-pid>). It could be useful to have the epilogue call this
script to ensure all errant processes are cleaned up after a job completes.
server:
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 43
Running Fluent Simulation under PBS Professional
There are several job states to be aware of when using qstat, which are listed under the S column.
The two main states you will see are Q and R. Q indicates that the job is waiting in the queue to run.
At this point the scheduler has not found a suitable node or nodes to run the job. Once the scheduler
has found a suitable area to run the job and has sent the job to run, its state will be set to R. Full
details on the different job statuses reported by qstat can be found in the PBS Professional 7.1
Users Guide Section 6.1.1.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
44 of ANSYS, Inc. and its subsidiaries and affiliates.
Part 3: Running Fluent Under SGE
This document is made available via the Ansys, Inc. website for your convenience. Contact Oracle, Inc.
(http://www.oracle.com/) directly for support of their product.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. xlvii
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
xlviii of ANSYS, Inc. and its subsidiaries and affiliates.
Chapter 1: Introduction
Altar Grid Engine (formerly UGE, formerly SGE) software is a distributed computing resource management
tool that you can use with either the serial or the parallel version of Fluent. Throughout this manual
SGE is used to denote the Grid Engine software whether Altair, UGE or SGE. This document provides
general information about running Fluent under SGE, and is made available via the Ansys, Inc. website
for your convenience. Contact Altair (https://www.altair.com/hpc-cloud-applications) directly for support
of their product.
Fluent submits a process to the SGE software, then SGE selects the most suitable machine to process
the Fluent simulation. You can configure SGE and select the criteria by which SGE determines the most
suitable machine for the Fluent simulation.
Among many other features, running a Fluent simulation using SGE enables you to:
• Save the current status of the job (this is also known as checkpointing when the Fluent .cas and
.dat files are saved)
1.1.1. Requirements
• The versions of Altair Grid Engine software that are supported with Fluent are listed with the other
supported job schedulers and queuing systems posted on the Platform Support section of the
Ansys Website.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 49
Introduction
• Fluent 2023 R1
Important:
• ckpt_command.fluent
• migr_command.fluent
• sge_request
• kill-fluent
• sample_ckpt_obj
• sample_pe
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
50 of ANSYS, Inc. and its subsidiaries and affiliates.
Chapter 2: Configuring SGE for Fluent
SGE must be installed properly if checkpointing is needed or parallel Fluent is being run under SGE.
The checkpoint queues must be configured first, and they must be configured by someone with manager
or root privileges. The configuration can be performed either through the GUI qmon or the text command
qconf.
When running parallel Fluent simulations, the following options are also important:
SGE requires checkpointing objects to perform checkpointing operations. Fluent provides a sample
checkpointing object called sample_ckpt_obj.
Checkpoint configuration also requires root or manager privileges. While creating new checkpointing
objects for Fluent, keep the default values as given in the sample/default object provided by Fluent
and change only the following values:
The queue list should contain the queues that are able to be used as checkpoint objects.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 51
Configuring SGE for Fluent
These values should only be changed when the executable files are not in the default location, in
which case the full path should be specified. All the files (that is, ckpt_command.fluent and
migr_command.fluent) should be located in a directory that is accessible from all machines where
the Fluent simulation is running. When running Fluent 2023 R1, the default location for these files is
path/ansys_inc/v231/fluent/fluent23.1.0/addons/sge, where path is the Fluent in-
stallation directory.
This value dictates where the checkpointing subdirectories are created, and hence users must have
the correct permission to this directory. Also, this directory should be visible to all machines where
the Fluent simulation is running. The default value is NONE where Fluent uses the current working
directory as the checkpointing directory.
This value dictates when checkpoints are expected to be generated. Valid values of this parameter
are composed of the letters s, m, and x, in any order:
– Including s causes a job to be checkpointed, aborted, and migrated when the corresponding SGE
Exceed daemon is shut down.
– Including x causes a job to be checkpointed, aborted, and migrated when a job is suspended.
Important:
Parallel environment configuration requires root or manager privileges. Change only the following values
when creating a new parallel environment for Fluent:
This should contain all the queues where qtype has been set to PARALLEL.
These contain the lists of users who are allowed or denied access to the parallel environment.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
52 of ANSYS, Inc. and its subsidiaries and affiliates.
Default Request File
This should be changed only if the kill-fluent executable is not in the default directory, in which
case the full path to the file should be given and the path should be accessible from every machine.
• slots (slots)
This should be set to a large numerical value, indicating the maximum of slots that can be occupied
by all the parallel environment jobs that are running.
Since Fluent uses fluent_pe as the default parallel environment, an administrator must define a
parallel environment with this name.
A default request file should be set up when using SGE with Fluent. Fluent provides a sample request
file called sge_request. To learn more about how to configure and utilize this resource, see the rel-
evant documentation available at www.oracle.com.
Individual users can set their own default arguments and options in a private general request file called
.sge_request, located in their $HOME directory. Private general request files override the options
set by the global sge_request file.
Any settings found in either the global or private default request file can be overridden by specifying
new options in the command line.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 53
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
54 of ANSYS, Inc. and its subsidiaries and affiliates.
Chapter 3: Running a Fluent Simulation under SGE
Information in this chapter is divided into the following sections:
3.1. Submitting a Fluent Job from the Command Line
3.2. Submitting a Fluent Job Using Fluent Launcher
3.3. Using Custom SGE Scripts
where:
<solver_version>
specifies the dimensionality of the problem and the precision of the Fluent calculation (for example,
3d, 2ddp).
<Fluent_options>
specify the start-up option(s) for Fluent, including the options for running Fluent in parallel. For
more information, see the Fluent User's Guide.
-scheduler=sge
-scheduler_list_queues
(start-up option) lists all available queues. Note that Fluent will not launch when this option is used.
-scheduler_queue=<queue>
-scheduler_opt=<opt>
(start-up option) enables an additional option <opt> that is relevant for SGE; see the SGE docu-
mentation for details. Note that you can include multiple instances of this option when you want
to use more than one scheduler option.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 55
Running a Fluent Simulation under SGE
The following is an example of an action you might want to take through such an option: you may
want to specify the checkpointing object and override the checkpointing option specified in the
default request file. If this option is not specified in the command line and the default general request
file contains no setting, then the Fluent simulation is unable to use checkpoints.
-scheduler_pe=<pe>
(start-up option) sets the parallel environment to <pe> when Fluent is run in parallel. This should
be used only if the -scheduler=sge option is used.
The specified <pe> must be defined by an administrator. For more information about creating a
parallel environment, refer to the sample_pe file that is located in the /addons/sge directory
in your Fluent installation area.
If Fluent is run in parallel under SGE and the -scheduler_pe= option is not specified, by default
it will attempt to utilize a parallel environment called fluent_pe. Note that fluent_pe must be
defined by an administrator if you are to use this default parallel environment.
-gui_machine=<hostname>
(start-up option) specifies that Cortex is run on a machine named <hostname> rather than automat-
ically on the same machine as that used for compute node 0. If you just include -gui_machine
(without =<hostname>), Cortex is run on the same machine used to submit the fluent command.
This option may be necessary to avoid poor graphics performance when running Fluent under SGE.
-scheduler_nodeonly
(start-up option) specifies that Cortex and host processes are launched before the job submission
and that only the parallel node processes are submitted to the scheduler.
-scheduler_headnode=<head-node>
(start-up option) specifies that the scheduler job submission machine is <head-node> (the default
is localhost).
-scheduler_workdir=<working-directory>
(start-up option) sets the working directory for the scheduler job, so that scheduler output is written
to a directory of your choice (<working-directory>) rather than the home directory or the directory
used to launch Fluent.
-scheduler_stderr=<err-file>
(start-up option) sets the name / directory of the scheduler standard error file to <err-file>; by default
it is saved as fluent.<PID>.e in the working directory, where <PID> is the process ID of the
top-level Fluent startup script.
-scheduler_stdout=<out-file>
(start-up option) sets the name / directory of the scheduler standard output file to <out-file>; by
default it is saved as fluent.<PID>.o in the working directory, where <PID> is the process ID
of the top-level Fluent startup script.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
56 of ANSYS, Inc. and its subsidiaries and affiliates.
Submitting a Fluent Job Using Fluent Launcher
-scheduler_tight_coupling
enables tight integration between SGE and the MPI. It is supported with Intel MPI (the default), but
it will not be used if the Cortex process is launched after the job submission (which is the default
when not using -scheduler_nodeonly) and is run outside of the scheduler environment by
using the -gui_machine or -gui_machine=<hostname> option.
Important:
You must have the DISPLAY environment variable properly defined, otherwise the graphical
user interface (GUI) will not operate correctly.
The following examples demonstrate some applications of the command line syntax:
• Parallel 2D Fluent under SGE on 4 CPUs, using the parallel environment diff_pe and the queue
large
fluent2d -t4 -scheduler=sge -scheduler_queue=large -scheduler_pe=diff_pe
1. Open Fluent Launcher (Figure 3.1: The Scheduler Tab of Fluent Launcher (Linux Version) (p. 58))
by entering fluent without any arguments in the Linux command line.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 57
Running a Fluent Simulation under SGE
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
58 of ANSYS, Inc. and its subsidiaries and affiliates.
Submitting a Fluent Job Using Fluent Launcher
c. Enter the name of a node for SGE qmaster. SGE will allow this node to summon jobs. By default,
localhost is specified for SGE qmaster. Note that the button enables you to check the
job status.
d. You have the option of entering the name of a queue in which you want your Fluent job
submitted for SGE Queue. Note that you can use the button to contact the SGE qmaster
for a list of queues.
e. If you are running a parallel simulation, you must enter the name of the parallel environment
in which you want your Fluent job submitted for SGE pe. The parallel environment must be
defined by an administrator. For more information about creating a parallel environment, refer
to the sample_pe file that is located in the /addons/sge directory in your Fluent installation
area.
f. You can specify an SGE configuration file by enabling the Use SGE Settings option. Then
enter the name and location of the file in the text box or browse to the file.
• Enable the Node Only option to specify that Cortex and host processes are launched before
the job submission and that only the parallel node processes are submitted to the scheduler.
• Enable the Tight Coupling option to enable a job-scheduler-supported native remote node
access mechanism in Linux. This tight integration is supported with Intel MPI (the default).
h. If you experience poor graphics performance when using SGE, you may be able to improve
performance by changing the machine on which Cortex (the process that manages the
graphical user interface and graphics) is running. The Graphics Rendering Machine list
provides the following options:
• Select First Allocated Node if you want Cortex to run on the same machine as that used
for compute node 0. Note that this is not available if you have enabled the Node Only option.
• Select Current Machine if you want Cortex to run on the same machine used to start Fluent
Launcher.
• Select Specify Machine if you want Cortex to run on a specified machine, which you select
from the drop-down list below.
Note that if you enable the Tight Coupling option, it will not be used if the Cortex process
is launched after the job submission (which is the default when not using the Node Only
option) and is run outside of the scheduler environment by using Current Machine or Specify
Machine.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 59
Running a Fluent Simulation under SGE
4. Set up the other aspects of your Fluent simulation using the Fluent Launcher GUI items. For more
information, see the Fluent User's Guide.
Important:
You must have the DISPLAY environment variable properly defined, otherwise the graphical
user interface (GUI) will not operate correctly.
Note:
Submitting your Fluent job from the command line provides some options that are not
available in Fluent Launcher, such as setting the scheduler standard error file or standard
output file. For details, see Submitting a Fluent Job from the Command Line (p. 55).
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
60 of ANSYS, Inc. and its subsidiaries and affiliates.
Chapter 4: Running Fluent Utility Scripts under SGE
You can run the Fluent utility scripts (for example, fe2ram, fl42seg, partition, tconv, tmerge)
with the SGE load management system.
The command line syntax to launch a Fluent utility script under SGE is as follows:
utility <utility_name> [<utility_opts>] -sge [-sgeq <queue_name>] [-sgepe <parallel_env> <MIN_N>-<MAX_N>] <utility_i
where
<utility_name>
is the name of the utility to be launched (for example, fe2ram, fl42seg, partition, tconv,
tmerge).
<utility_opts>
are the options that are part of the syntax of the utility being launched. For more information about
the options for the various utilities, see the Fluent User's Guide.
-sge
-sgeq <queue_name>
(start-up option) specifies the parallel environment to be used when <utility_name> is run in parallel.
This should be used only if the -sge option is used.
The specified <parallel_env> must be defined by an administrator. For more information about
creating a parallel environment, refer to the sample_pe file that is located in the /addons/sge
directory in your Fluent installation area.
The values for <MIN_N> and <MAX_N> specify the minimum and maximum number of compute
nodes, respectively.
If <utility_name> is run in parallel under SGE and the -sgepe parameter is not specified, by default
the utility will attempt to utilize a parallel environment called utility_pe. Note that utility_pe
must be defined by an administrator if you are to use this default parallel environment. In such a
case, <MIN_N> will be set to 1 and <MAX_N> will be set to the maximum number of requested
compute nodes specified in the <utility_opts> (for example, -t4).
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 61
Running Fluent Utility Scripts under SGE
<utility_inputs>
are the fields that are part of the syntax of the utility being launched. For more information about
the necessary fields for the various utilities, see the Fluent User's Guide.
Important:
Note that neither checkpointing, restarting, nor migrating are available when using the
utility scripts under SGE.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
62 of ANSYS, Inc. and its subsidiaries and affiliates.
Part 4: Running Fluent Under Slurm
This document is made available via the Ansys, Inc. website for your convenience.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. lxv
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
lxvi of ANSYS, Inc. and its subsidiaries and affiliates.
Chapter 1: Introduction
Slurm is an open-source workload management tool for local and distributed environments. You can
use Slurm when running Fluent simulations. This document provides general information about running
Fluent under Slurm, and is made available via the Ansys, Inc. website for your convenience.
• Standard
– The versions of Slurm that are supported with Fluent are listed with the other supported job
schedulers and queuing systems posted on the Platform Support section of the Ansys Website.
– Fluent 2023 R1
Important:
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 67
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
68 of ANSYS, Inc. and its subsidiaries and affiliates.
Chapter 2: Running Fluent Simulation under Slurm
For more information, see the following sections:
2.1. Using the Integrated Slurm Capability
2.2. Using Your Own Supplied Job Script
2.3. Epilogue/Prologue
2.4. Monitoring/Manipulating Jobs
2.1.1. Overview
One option of using Slurm with Fluent is to use the Slurm launching capability integrated directly
into Fluent. In this mode, Fluent is started from the command line with the additional -sched-
uler=slurm argument. Fluent then takes responsibility for relaunching itself under Slurm. This has
the following advantages:
• The command line usage is very similar to the non-RMS (resource management system) usage.
The integrated Slurm capability is intended to simplify usage for the most common situations. If you
desire more control over the Slurm sbatch options for more complex situations or systems (or if
you are using an older version of Fluent), you can always write or adapt a Slurm script that starts
Fluent in the desired manner (see Using Your Own Supplied Job Script (p. 74) for details).
2.1.2. Usage
Information on usage is provided in the following sections:
2.1.2.1. Submitting a Fluent Job from the Command Line
2.1.2.2. Submitting a Fluent Job Using Fluent Launcher
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 69
Running Fluent Simulation under Slurm
where
• <solver_version> specifies the dimensionality of the problem and the precision of the Fluent
calculation (for example, 3d, 2ddp).
• <Fluent_options> can be added to specify the startup option(s) for Fluent, including the options
for running Fluent in parallel. For more information, see the Fluent User's Guide.
• -scheduler=slurm is added to the Fluent command to specify that you are running under
Slurm.
• -scheduler_list_queues lists all available queues. Note that Fluent will not launch when
this option is used.
• -scheduler_opt=<opt> enables an additional option <opt> that is relevant for Slurm; see
the Slurm documentation for details. Note that you can include multiple instances of this option
when you want to use more than one scheduler option.
• -scheduler_nodeonly allows you to specify that Cortex and host processes are launched
before the job submission and that only the parallel node processes are submitted to the
scheduler.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
70 of ANSYS, Inc. and its subsidiaries and affiliates.
Using the Integrated Slurm Capability
• -scheduler_stderr=<err-file> sets the name / directory of the scheduler standard error file
to <err-file>; by default it is saved as fluent.<PID>.e in the working directory, where <PID>
is the process ID of the top-level Fluent startup script.
• -scheduler_ppn=<x> sets the number of node processes per cluster node to <x> (rather than
leaving it to the cluster configuration).
• -scheduler_gpn=<x> sets the number of graphics processing units (GPUs) per cluster node
to <x> (by default, it is set to 0).
This syntax will submit the Fluent job under Slurm using the sbatch command in a batch manner
and return a job ID. This job ID can then be used to query, control, or stop the job using standard
Slurm commands, such as squeue or scancel. Slurm will start the job when resources are available.
The job will be run out of the current working directory.
Note:
• You must have the DISPLAY environment variable properly defined, otherwise the
graphical user interface (GUI) will not operate correctly.
• Dynamic spawning (that is, automatically spawning additional parallel node processes
when switching from meshing mode to solution mode to achieve the requested
number of total solution processes) is not allowed under Slurm, except when the
-gui_machine=<hostname> or -gui_machine option is also used.
• The combination of Slurm + Open MPI + distributed memory on a cluster is not sup-
ported, except when the -gui_machine=<hostname> or -gui_machine option
is also used.
• Tight integration between Slurm and the MPI is enabled by default for Intel MPI (the
default), except when the Cortex process is launched after the job submission (which
is the default when not using -scheduler_nodeonly) and is run outside of the
scheduler environment by using the -gui_machine or -gui_machine=<hostname>
option.
• Process binding (affinity) is managed by Slurm instead of Fluent when you do not use
the -gui_machine=<hostname> or -gui_machine option.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 71
Running Fluent Simulation under Slurm
1. Open Fluent Launcher (Figure 2.1: The Scheduler Tab of Fluent Launcher (Linux Version) (p. 72))
by entering fluent without any arguments in the Linux command line.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
72 of ANSYS, Inc. and its subsidiaries and affiliates.
Using the Integrated Slurm Capability
c. You can choose to make a selection from the Slurm Submission Host drop-down list to
specify the Slurm submission host name for submitting the job, if the machine you are using
to run the launcher cannot submit jobs to Slurm.
d. You can choose to make a selection from the Slurm Partition drop-down list to request a
specific partition for the resource allocation.
e. You can choose to make a selection from the Slurm Account drop-down list to specify the
Slurm account.
f. Enable the Node Only option under Common Options to specify that Cortex and host
processes are launched before the job submission and that only the parallel node processes
are submitted to the scheduler.
g. If you experience poor graphics performance when using Slurm, you may be able to improve
performance by changing the machine on which Cortex (the process that manages the
graphical user interface and graphics) is running. The Graphics Rendering Machine list
provides the following options:
• Select First Allocated Node if you want Cortex to run on the same machine as that used
for compute node 0. This is not available if you have enabled the Node Only option.
• Select Current Machine if you want Cortex to run on the same machine used to start
Fluent Launcher.
• Select Specify Machine if you want Cortex to run on a specified machine, which you select
from the drop-down list below.
Note:
• Tight integration between Slurm and the MPI is enabled by default for Intel
MPI (the default), except when the Cortex process is launched after the job
submission (which is the default when not using the Node Only option) and
is run outside of the scheduler environment by using Current Machine or
Specify Machine.
• Process binding (affinity) is managed by Slurm instead of Fluent when you have
selected First Allocated Node from the Graphics Rendering Machine list.
4. If you want to set the number of node processes per cluster node (rather than leaving it to the
cluster configuration), click the Environment tab and define it using the FL_SCHEDULER_PPN
environment variable.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 73
Running Fluent Simulation under Slurm
5. Set up the other aspects of your Fluent simulation using the Fluent Launcher GUI items. For
more information, see the Fluent User's Guide.
Important:
You must have the DISPLAY environment variable properly defined, otherwise the
graphical user interface (GUI) will not operate correctly.
Note:
Submitting your Fluent job from the command line provides some options that are not
available in Fluent Launcher, such as setting the scheduler standard error file or standard
output file. For details, see Submitting a Fluent Job from the Command Line (p. 70).
2.1.3. Examples
• Submit a parallel, 4-process job using a journal file fl5s3.jou:
> fluent 3d -t4 -i fl5s3.jou -scheduler=slurm
Starting sbatch < user-scheduler-14239.slurm
/bin/sbatch
Submitted batch job 524
In the previous example, note that sbatch returns 524 as the job ID and user-scheduler-
14239.slurm is name of the Slurm script written by Fluent for submitting this job.
The command in the previous example lists the status of the given job. The squeue command can
be used to list all the jobs in the queue.
After the job is complete, the job will no longer show up in the output of the squeue command.
The results of the run will then be available in the scheduler standard output file.
2.1.4. Limitations
The integrated Slurm capability in Fluent 2023 R1 has the following limitations:
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
74 of ANSYS, Inc. and its subsidiaries and affiliates.
Monitoring/Manipulating Jobs
If you use custom Slurm scripts instead of relying on the standard Fluent option (either the -sched-
uler=slurm option from the command line or the Use Slurm option in Fluent Launcher, as described
in the preceding sections), your environment variables related to the job scheduler will not be used
unless you include the -scheduler_custom_script option with the Fluent options in your script.
2.3. Epilogue/Prologue
Slurm provides the ability to script some actions immediately before the job starts and immediately
after it ends (even if it is removed). The epilogue is a good place to put any best effort cleanup and
stray process functionality that might be necessary to clean up after a job. For instance, one could include
functionality to clean up the working/scratch area for a job if is deleted before its completion. Fluent
provides a way to get rid of job-related processes in the form of a shell script in the work directory
(cleanup-fluent-<host-pid>). It could be useful to have the epilogue call this script to ensure all
errant processes are cleaned up after a job completes.
There are several job states to be aware of when using squeue, which are listed under the ST column.
The two main states you will see are PD and R. PD indicates that the job is waiting in the queue to
run. At this point the scheduler has not found a suitable node or nodes to run the job. Once the
scheduler has found a suitable area to run the job and has sent the job to run, its state will be set to
R.
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
of ANSYS, Inc. and its subsidiaries and affiliates. 75
Release 2023 R1 - © ANSYS, Inc. All rights reserved. - Contains proprietary and confidential information
76 of ANSYS, Inc. and its subsidiaries and affiliates.