Ans Dan
Ans Dan
ANSYS, Inc. Southpointe 275 Technology Drive Canonsburg, PA 15317 [email protected] http://www.ansys.com (T) 724-746-3304 (F) 724-514-9494
Disclaimer Notice
THIS ANSYS SOFTWARE PRODUCT AND PROGRAM DOCUMENTATION INCLUDE TRADE SECRETS AND ARE CONFIDENTIAL AND PROPRIETARY PRODUCTS OF ANSYS, INC., ITS SUBSIDIARIES, OR LICENSORS. The software products and documentation are furnished by ANSYS, Inc., its subsidiaries, or affiliates under a software license agreement that contains provisions concerning non-disclosure, copying, length and nature of use, compliance with exporting laws, warranties, disclaimers, limitations of liability, and remedies, and other provisions. The software products and documentation may be used, disclosed, transferred, or copied only in accordance with the terms and conditions of that software license agreement. ANSYS, Inc. is certified to ISO 9001:2008.
Third-Party Software
See the legal information in the product help files for the complete Legal Notice for ANSYS proprietary software and third-party software. If you are unable to access the Legal Notice, please contact ANSYS, Inc. Published in the U.S.A.
Table of Contents
1. Overview of Parallel Processing .............................................................................................................. 1 1.1. Parallel Processing Terminolgy .......................................................................................................... 1 1.1.1. Hardware Terminology ............................................................................................................. 2 1.1.2. Software Terminology .............................................................................................................. 2 1.2. HPC Licensing ................................................................................................................................... 3 2. Using Shared-Memory ANSYS ................................................................................................................ 5 2.1. Activating Parallel Processing in a Shared-Memory Architecture ........................................................ 5 2.1.1. System-Specific Considerations ............................................................................................... 6 2.2. Troubleshooting ............................................................................................................................... 6 3. GPU Accelerator Capability ..................................................................................................................... 9 3.1. Activating the GPU Accelerator Capability ....................................................................................... 10 3.2. Supported Analysis Types and Features ........................................................................................... 11 3.2.1. Supported Analysis Types ....................................................................................................... 11 3.2.2. Supported Features ................................................................................................................ 11 3.3. Troubleshooting ............................................................................................................................. 12 4. Using Distributed ANSYS ...................................................................................................................... 15 4.1. Configuring Distributed ANSYS ....................................................................................................... 17 4.1.1. Prerequisites for Running Distributed ANSYS .......................................................................... 17 4.1.1.1. MPI Software ................................................................................................................. 18 4.1.1.2. Installing the Software ................................................................................................... 19 4.1.2. Setting Up the Cluster Environment for Distributed ANSYS ...................................................... 20 4.1.2.1. Optional Setup Tasks ..................................................................................................... 22 4.1.2.2. Using the mpitest Program ............................................................................................ 23 4.1.2.3. Interconnect Configuration ............................................................................................ 24 4.2. Activating Distributed ANSYS ......................................................................................................... 25 4.2.1. Starting Distributed ANSYS via the Launcher .......................................................................... 25 4.2.2. Starting Distributed ANSYS via Command Line ....................................................................... 26 4.2.3. Starting Distributed ANSYS via the HPC Job Manager .............................................................. 28 4.2.4. Starting Distributed ANSYS in ANSYS Workbench .................................................................... 28 4.2.5. Using MPI appfiles .................................................................................................................. 28 4.2.6. Controlling Files that Distributed ANSYS Writes ....................................................................... 29 4.3. Supported Analysis Types and Features ........................................................................................... 30 4.3.1. Supported Analysis Types ....................................................................................................... 30 4.3.2. Supported Features ................................................................................................................ 32 4.4. Understanding the Working Principles and Behavior of Distributed ANSYS ....................................... 33 4.4.1. Differences in General Behavior ............................................................................................. 33 4.4.2. Differences in Solution Processing .......................................................................................... 35 4.4.3. Differences in Postprocessing ................................................................................................. 36 4.4.4. Restarts in Distributed ANSYS ................................................................................................. 37 4.5. Example Problems .......................................................................................................................... 38 4.5.1. Example: Running Distributed ANSYS on Linux ....................................................................... 39 4.5.2. Example: Running Distributed ANSYS on Windows .................................................................. 41 4.6. Troubleshooting ............................................................................................................................. 42 4.6.1. Setup and Launch Issues ........................................................................................................ 42 4.6.2. Solution and Performance Issues ............................................................................................ 44 Index .......................................................................................................................................................... 47
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
iii
iv
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
List of Tables
4.1. Parallel Capability in Shared-Memory and Distributed ANSYS ................................................................ 16 4.2. Platforms and MPI Software .................................................................................................................. 18 4.3. LS-DYNA MPP MPI Support on Windows and Linux ................................................................................ 19 4.4. Required Files for Multiframe Restarts ................................................................................................... 38
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
vi
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
Overview of Parallel Processing ability refers to our software offering which allows the program to take advantage of certain GPU (graphics processing unit) hardware to accelerate the speed of the solver computations.
Distributed-memoryhardware
Distributed ANSYS
HPC Licensing matrix generation, linear equation solving, and results calculations). Preand postprocessing do not make use of the distributed-memory parallel processing; however, these steps can make use of shared-memory parallelism. See Using Distributed ANSYS (p. 15) for more details. GPU accelerator capability This capability takes advantage of the highly parallel architecture of the GPU hardware to accelerate the speed of solver computations and, therefore, reduce the time required to complete a simulation in ANSYS. Some computations of certain equation solvers can be off-loaded from the CPU(s) to the GPU, where they are often executed much faster. The CPU core(s) will continue to be used for all other computations in and around the equation solvers. For more information, see GPU Accelerator Capability (p. 9).
Shared-memory ANSYS can only be run on shared-memory hardware. However, Distributed ANSYS can be run on both shared-memory hardware or distributed-memory hardware. While both forms of hardware can achieve a significant speedup with Distributed ANSYS, only running on distributed-memory hardware allows you to take advantage of increased resources (for example, available memory and disk space, as well as memory and I/O bandwidths) by using multiple machines. Currently, only a single GPU accelerator device per machine (e.g., desktop workstation or single compute node of a cluster) can be utilized by the ANSYS program during a solution. The GPU accelerator capability can be used with either shared-memory ANSYS or Distributed ANSYS.
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
Overview of Parallel Processing The HPC license options described here do not apply to ANSYS LS-DYNA; see the ANSYS LS-DYNA User's Guide for details on parallel processing options with ANSYS LS-DYNA.
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
2.
3. 4. 5.
Select the correct environment and license. Go to the High Performance Computing Setup tab. Select Use Shared-Memory Parallel (SMP). Specify the number of cores to use. Alternatively, you can specify the number of cores to use via the -np command line option:
ansys145 -np N
where N represents the number of cores to use. For large multiprocessor servers, ANSYS, Inc. recommends setting N to a value no higher than the number of available cores minus one. For example, on an eight-core system, set N to 7. However,
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
Using Shared-Memory ANSYS on multiprocessor workstations, you may want to use all available cores to minimize the total solution time. The program automatically limits the maximum number of cores used to be less than or equal to the number of physical cores on the machine. This is done to avoid running the program on virtual cores (e.g., by means of hyperthreading), which typically results in poor percore performance. For optimal performance, consider closing down all other applications before launching ANSYS. 6. 7. If working from the launcher, click Run to launch ANSYS. Set up and run your analysis as you normally would.
2.2. Troubleshooting
This section describes problems which you may encounter while using shared-memory ANSYS as well as methods for overcoming these problems. Some of these problems are specific to a particular system, as noted. Job fails with SIGTERM signal (Linux Only) Occasionally, when running on Linux, a simulation may fail with the following message: process killed (SIGTERM) . This typically occurs when computing the solution and means that the system has killed the ANSYS process. The two most common occurrences are (1) ANSYS is using too much of the hardware resources and the system has killed the ANSYS process or (2) a user has manually killed the ANSYS job (i.e., kill -9 system command). Users should check the size of job they are running in relation to the amount of physical memory on the machine. Most often, decreasing the model size or finding a machine with more RAM will result in a successful run. Poor Speedup or No Speedup As more cores are utilized, the runtimes are generally expected to decrease. The biggest relative gains are typically achieved when using two cores compared to using a single core. When significant speedups are not seen as additional cores are used, the reasons may involve both hardware and software issues. These include, but are not limited to, the following situations. Hardware Oversubscribing hardware In a multiuser environment, this could mean that more physical cores are being used by ANSYS simulations than are available on the machine. It could also mean that hyperthreading is activated. Hyperthreading typically involves enabling extra virtual cores, which can sometimes allow software programs to more effectively use the full processing power of the CPU. However, for compute-intensive programs such as ANSYS, using these virtual cores rarely
6 Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
Troubleshooting provides a significant reduction in runtime. Therefore, it is recommended you do not use hyperthreading when running the ANSYS program; if hyperthreading is enabled, it is recommended you do not exceed the number of physical cores. Lack of memory bandwidth On some systems, using most or all of the available cores can result in a lack of memory bandwidth. This lack of memory bandwidth can impact the overall scalability of the ANSYS software. Dynamic Processor Speeds Many new CPUs have the ability to dynamically adjust the clock speed at which they operate based on the current workloads. Typically, when only a single core is being used the clock speed can be significantly higher than when all of the CPU cores are being utilized. This can have a negative impact on scalability as the per-core computational performance can be much higher when only a single core is active versus the case when all of the CPU cores are active. Software Simulation includes non-supported features The shared- and distributed-memory parallelisms work to speed up certain compute-intensive operations in /PREP7, /SOLU and /POST1. However, not all operations are parallelized. If a particular operation that is not parallelized dominates the simulation time, then using additional cores will not help achieve a faster runtime. Simulation has too few DOF (degrees of freedom) Some analyses (such as transient analyses) may require long compute times, not because the number of DOF is large, but because a large number of calculations are performed (i.e., a very large number of time steps). Generally, if the number of DOF is relatively small, parallel processing will not significantly decrease the solution time. Consequently, for small models with many time steps, parallel performance may be poor because the model size is too small to fully utilize a large number of cores. I/O cost dominates solution time For some simulations, the amount of memory required to obtain a solution is greater than the physical memory (i.e., RAM) available on the machine. In these cases, either virtual memory (i.e., hard disk space) is used by the operating system to hold the data that would otherwise be stored in memory, or the equation solver writes extra files to the disk to store data. In both cases, the extra I/O done using the hard drive can significantly impact performance, making the I/O performance the main bottleneck to achieving optimal performance. In these cases, using additional cores will typically not result in a significant reduction in overall time to solution. Different Results Relative to a Single Core Shared-memory parallel processing occurs in various preprocessing, solution, and postprocessing operations. Operational randomness and numerical round-off inherent to parallelism can cause slightly different results between runs on the same machine using the same number of cores or different numbers of cores. This difference is often negligible. However, in some cases the difference is appreciable. This sort of behavior is most commonly seen on nonlinear static or transient analyses which are numerically unstable. The more numerically unstable the model is, the more likely the convergence pattern or final results will differ as the number of cores used in the simulation is changed. With shared-memory parallelism, you can use the PSCONTROL command to control which operations actually use parallel behavior. For example, you could use this command to show that the element matrix generation running in parallel is causing a nonlinear job to converge to a slightly different solution each time it runs (even on the same machine with no change to the input data). This can help isolate parallel computations which are affecting the solution while maintaining as much other parallelism as possible to continue to reduce the time to solution.
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
GPU Accelerator Capability GPU acceleration is allowed when using one or more ANSYS HPC Pack licenses. One HPC Pack enables one GPU (in addition to enabling up to eight traditional cores). For more information about HPC Packs, see HPC Licensing in the ANSYS, Inc. Licensing Guide. The following GPU accelerator topics are available: 3.1. Activating the GPU Accelerator Capability 3.2. Supported Analysis Types and Features 3.3.Troubleshooting
3. Select the correct environment and license. 4. Go to the High Performance Computing Setup tab. Select Use GPU Accelerator Capability. 5. Alternatively, you can activate the GPU accelerator capability via the -acc command line option:
ansys145 -acc nvidia -na N
The -na command line option followed by a number (N) indicates the number of GPU accelerator devices to use per machine or compute node. If only the -acc option is specified, the program uses a single GPU device per machine or compute node by default (that is, -na 1). 6. If working from the launcher, click Run to launch ANSYS. 7. Set up and run your analysis as you normally would.
Note
The High Performance Computing Setup tab of the Product Launcher does not allow you to specify GPU acceleration in conjunction with distributed-memory parallel processing, nor does it allow you to specify multiple GPU devices per machine or compute node. If either or both of these capabilities is desired, you must select the Customization/Preferences tab in the Product Launcher and input -acc nvidia -na N in the Additional Parameters field. Alternatively, you can use the command line to launch ANSYS. With the GPU accelerator capability, the acceleration obtained by using the parallelism on the GPU hardware occurs only during the solution operations. Operational randomness and numerical round-off inherent to any parallel algorithm can cause slightly different results between runs on the same machine when using or not using the GPU hardware to accelerate the simulation.
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
10
Supported Analysis Types and Features The ACCOPTION command can also be used to control activation of the GPU accelerator capability.
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
11
3.3. Troubleshooting
This section describes problems which you may encounter while using the GPU accelerator capability, as well as methods for overcoming these problems. Some of these problems are specific to a particular system, as noted. Note that GPU acceleration is not supported on the following platforms: Windows 32-bit, and Linux x64 with SUSE Linux Enterprise 10. To list the GPU devices installed on the machine, set the ANSGPU_PRINTDEVICES environment variable to a value of 1. The printed list may or may not include graphics cards used for display purposes, along with any graphics cards used to accelerate your simulation. No Supported Devices Be sure that a supported GPU device is properly installed and configured. Check the driver level to be sure it is current or newer than the driver version supported by ANSYS for your particular device. (See the GPU requirements outlined in the Windows Installation Guide and the Linux Installation Guide.)
Note
On Windows, the use of Remote Desktop may disable the use of a GPU device. Launching Mechanical APDL through the ANSYS Remote Solve Manager (RSM) when RSM is installed as a service may also disable the use of a GPU. In these two scenarios, the GPU Accelerator Capability cannot be used. Using the TCC (Tesla Compute Cluster) driver mode, if applicable, can circumvent this restriction. No Valid Devices A GPU device was detected, but it is not a GPU device supported by ANSYS. Be sure that a supported GPU device is properly installed and configured. Check the driver level to be sure it is current or newer than the driver version supported by ANSYS for your particular device. (See the GPU requirements outlined in the Windows Installation Guide and the Linux Installation Guide.) Poor Acceleration or No Acceleration Simulation includes non-supported features A GPU device will only accelerate certain portions of the ANSYS code, mainly the solution time. If the bulk of the simulation time is spent outside of solution, the GPU cannot have a significant impact on the overall analysis time. Even if the bulk of the simulation is spent inside solution, you must be sure that a supported equation solver is utilized during solution and that no unsupported options are used. Messages are printed in the output to alert users when a GPU is being used, as well as when unsupported options/features are chosen which deactivate the GPU accelerator capability. Simulation does not fully utilize the GPU Only simulations that spend a lot of time performing calculations that are supported on a GPU can expect to see significant speedups when a GPU is used. Only certain computations are supported for GPU acceleration. Therefore, users should check to ensure that a high percentage of the solution time was spent performing computations that could possibly be accelerated on a GPU. This can be done by reviewing the equation solver statistics files as described below. See Measuring ANSYS Performance in the Performance Guide for more details on the equation solver statistics files. PCG solver file: The .PCS file contains statistics for the PCG iterative solver. You should first check to make sure that the GPU was utilized by the solver. This can be done by looking at the line which begins with: Number of cores used . The string GPU acceleration enabled will be added to this line if the GPU hardware was used by the solver. If this string is missing, the GPU
12 Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
Troubleshooting was not used for that call to the solver. Next, you should study the elapsed times for both the Preconditioner Factoring and Multiply With A22 computations. GPU hardware is only used to accelerate these two sets of computations. The wall clock (or elapsed) times for these computations are the areas of interest when determining how much GPU acceleration is achieved. Sparse solver files: The .BCS (or .DSP) file contains statistics for the sparse direct solver. You should first check to make sure that the GPU was utilized by the solver. This can be done by looking for the following line: GPU acceleration activated . This line will be printed if the GPU hardware was used. If this line is missing, the GPU was not used for that call to the solver. Next, you should check the percentage of factorization computations (flops) which were accelerated on a GPU. This is shown by the line: percentage of GPU accelerated flops . Also, you should look at the time to perform the matrix factorization, shown by the line: time (cpu & wall) for numeric factor . GPU hardware is only used to accelerate the matrix factor computations. These lines provide some indication of how much GPU acceleration is achieved. Eigensolver files: The .BCS file is written for the Block Lanczos eigensolver and can be used as described above for the sparse direct solver. The .PCS file is written for the PCG Lanczos eigensolver and can be used as described above for the PCG iterative solver. Simulation has too few DOF (degrees of freedom) Some analyses (such as transient analyses) may require long compute times, not because the number of DOF is large, but because a large number of calculations are performed (i.e., a very large number of time steps). Generally, if the number of DOF is relatively small, GPU acceleration will not significantly decrease the solution time. Consequently, for small models with many time steps, GPU acceleration may be poor because the model size is too small to fully utilize a GPU. Using multiple GP devices When using the sparse solver in a shared-memory parallel solution (shared-memory ANSYS), it is expected that running a simulation with multiple GPU devices will not improve performance compared to running with a single GPU device. In a shared-memory parallel solution, the sparse solver can only make use of one GPU device. Oversubscribing GPU hardware The program automatically determines which GPU devices to use. In a multiuser environment, this could mean that one or more of the same GPUs are picked for multiple simultaneous ANSYS simulations, thus oversubscribing the hardware. If only a single GPU accelerator device exists in the machine, then only a single user should attempt to make use of it, much in the same way users should avoid oversubscribing their CPU cores. If multiple GPU accelerator devices exist in the machine, you can set the ANSGPU_DEVICE environment variable, in conjunction with the ANSGPU_PRINTDEVICES environment variable mentioned above, to tell ANSYS which particular GPU accelerator device to use during the solution. Note that this only applies if you plan to use a single GPU accelerator device. For example, if ANSGPU_PRINTDEVICES shows that three GPU devices are available but only the second and third devices are supported for GPU acceleration in ANSYS, you may want to select the second supported GPU device by setting ANSGPU_DEVICE equal to the corresponding device ID value displayed in the list of GPU devices. Some GPU devices support an exclusive mode that allows an application to lock the usage of the GPU so that no other application can access the GPU until the first application completes. In a multiuser environment, this setting may be helpful to avoid oversubscribing the GPU hardware. However, this mode is not recommended for Distributed ANSYS users because each Distributed ANSYS process needs to lock onto the GPU. For example, if running Distributed ANSYS on
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
13
GPU Accelerator Capability eight cores and one GPU, eight processes will attempt to access the one GPU. In exclusive mode this would fail, and Distributed ANSYS would fail to launch.
14
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
15
Using Distributed ANSYS cessing steps and may use shared-memory parallelism to improve performance of these operations. During this time, the slave processes wait to receive new commands from the master process. Once the SOLVE command is issued, it is communicated to the slave processes and all Distributed ANSYS processes become active. At this time, the program makes a decision as to which mode to use when computing the solution. In some cases, the solution will proceed using only a distributed-memory parallel (DMP) mode. In other cases, similar to pre- and postprocessing, the solution will proceed using only a shared-memory parallel (SMP) mode. In a few cases, a mixed mode may be implemented which tries to use as much distributed-memory parallelism as possible for maximum performance. These three modes are described further below. Pure DMP mode The simulation is fully supported by Distributed ANSYS, and distributed-memory parallelism is used throughout the solution. This mode typically provides optimal performance in Distributed ANSYS. Mixed mode The simulation involves an equation solver that is not supported by Distributed ANSYS. In this case, distributed-memory parallelism is used throughout the solution, except for the equation solver. When the equation solver is reached, the slave processes in Distributed ANSYS simply wait while the master process uses shared-memory parallelism to compute the equation solution. After the equation solution is computed, the slave processes continue to compute again until the entire solution is completed. Pure SMP mode The simulation involves an analysis type or feature that is not supported by Distributed ANSYS. In this case, distributed-memory parallelism is disabled at the onset of the solution, and sharedmemory parallelism is used instead. The slave processes in Distributed ANSYS are not involved at all in the solution but simply wait while the master process uses shared-memory parallelism to compute the entire solution. When using shared-memory parallelism inside of Distributed ANSYS (in mixed mode or SMP mode, including all pre- and postprocessing operations), the master process will not use more cores on the master machine than the total cores you specify to be used for the Distributed ANSYS solution. This is done to avoid exceeding the requested CPU resources or the requested number of licenses. The following table shows which steps, including specific equation solvers, can be run in parallel using shared-memory ANSYS and Distributed ANSYS. Table 4.1: Parallel Capability in Shared-Memory and Distributed ANSYS Solvers/Feature Sparse PCG ICCG JCG QMR Block Lanczos eigensolver PCG Lanczos eigensolver Supernode eigensolver Subspace eigensolver Unsymmetric eigensolver Shared-Memory ANSYS Y Y Y Y Y Y Y Y Y Y Distributed ANSYS Y Y Y [1] Y [1] [2] Y [1] Y [1] Y Y [1] Y Y
16
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
Configuring Distributed ANSYS Solvers/Feature Damped eigensolver QR damp eigensolver Element formulation, results calculation Graphics and other pre- and postprocessing Shared-Memory ANSYS Y Y Y Y Distributed ANSYS Y Y [3] Y Y [1]
1. This solver/operation only runs in mixed mode. 2. For static analyses and transient analyses using the full method (TRNOPT,FULL), the JCG equation solver runs in pure DMP mode only when the matrix is symmetric. Otherwise, it runs in SMP mode. 3. The QR damp eigensolver only runs in pure SMP mode. The maximum number of cores allowed in a Distributed ANSYS analysis is currently set at 8192. Therefore, you can run Distributed ANSYS using anywhere from 2 to 8192 cores (assuming the appropriate HPC licenses are available) for each individual job. Performance results vary widely for every model when using any form of parallel processing. For every model, there is a point where using more cores does not significantly reduce the overall solution time. Therefore, it is expected that most models run in Distributed ANSYS can not efficiently make use of hundreds or thousands of cores. Files generated by Distributed ANSYS are named Jobnamen.ext, where n is the process number. (See Differences in General Behavior (p. 33) for more information.) The master process is always numbered 0, and the slave processes are 1, 2, etc. When the solution is complete and you issue the FINISH command in the SOLUTION processor, Distributed ANSYS combines all Jobnamen.RST files into a single Jobname.RST file, located on the master machine. Other files, such as .MODE, .ESAV, .EMAT, etc., may be combined as well upon finishing a distributed solution. (See Differences in Postprocessing (p. 36) for more information.) The remaining sections explain how to configure your environment to run Distributed ANSYS, how to run a Distributed ANSYS analysis, and what features and analysis types are supported in Distributed ANSYS. You should read these sections carefully and fully understand the process before attempting to run a distributed analysis. The proper configuration of your environment and the installation and configuration of the appropriate MPI software are critical to successfully running a distributed analysis.
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
17
Using Distributed ANSYS Distributed ANSYS allows you to use two cores without using any HPC licenses. Additional licenses will be needed to run Distributed ANSYS with more than two cores. Several HPC license options are available. For more information, see HPC Licensing in the Parallel Processing Guide. If you are running on a single machine, there are no additional requirements for running Distributed ANSYS. If you are running across multiple machines (e.g., a cluster), your system must meet these additional requirements to run Distributed ANSYS. Homogeneous network: All machines in the cluster must be the same type, OS level, chip set, and interconnects. You must be able to remotely log in to all machines, and all machines in the cluster must have identical directory structures (including the ANSYS installation, MPI installation, and on some systems, working directories). Do not change or rename directories after you've launched ANSYS. For more information on files that are written and their location, see Controlling Files that Distributed ANSYS Writes in the Parallel Processing Guide. All machines in the cluster must have ANSYS installed, or must have an NFS mount to the ANSYS installation. If not installed on a shared file system, ANSYS must be installed in the same directory path on all systems. All machines must have the same version of MPI software installed and running. The table below shows the MPI software and version level supported for each platform. For Linux platforms, the MPI software is included with the ANSYS installation. For Windows platforms, you must install the MPI software as described later in this document.
18
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
Configuring Distributed ANSYS Platform Windows 32-bit / Windows XP / Windows Vista / Windows 7 Windows 64-bit / Windows XP x64 / Windows Vista x64 / Windows 7 x64 Windows HPC Server 2008 x64 Microsoft HPC Pack (MS MPI) MPI Software Platform MPI 8.2.1 Intel MPI 4.0.3 More Information Platform MPI: http://www.platform.com/cluster-computing/ platform-mpi Intel MPI: http://software.intel.com/en-us/articles/intel-mpilibrary-documentation/
http://www.microsoft.com/hpc/
ANSYS LS-DYNA If you are running ANSYS LS-DYNA, you can use LS-DYNA's parallel processing (MPP or SMP) capabilities. Use the launcher or the command line method as described in Activating Distributed ANSYS in the Parallel Processing Guide to run LS-DYNA MPP. For Windows and Linux systems, please see the following table for LS-DYNA MPP MPI support. For more information on using ANSYS LS-DYNA in general, and its parallel processing capabilities specifically, see the ANSYS LS-DYNA User's Guide. Table 4.3: LS-DYNA MPP MPI Support on Windows and Linux MPI version for DYNA MPP Platform MPI MS MPI 32-bit Windows n/a n/a 64-bit Windows X X 64-bit Linux X n/a
19
20
Configuring Distributed ANSYS 2. Linux only: Set up the .rhosts file on each machine. The .rhosts file lists all machines in the cluster. The machines should be listed using their complete system name, as taken from hostname. For example, an .rhosts file for a two-machine cluster might look like this:
golinux1.ansys.com jqd golinux2 jqd
Verify communication between machines via rsh or ssh (e.g., rsh golinux2 ls). You should not be prompted for a password. If you are, check the .rhosts permissions and machine names for correctness. For more information on using remote shells, see the man pages for rsh or ssh. 3. If you want the list of machines to be populated in the Mechanical APDL Product Launcher, you need to configure the hosts145.ans file. You can use the ANS_ADMIN utility to configure this file. You can manually modify the file later, but we strongly recommend that you use ANS_ADMIN to create this file initially to ensure that you establish the correct format. Windows: Start >Programs >ANSYS 14.5 >Utilities >ANS_ADMIN 14.5 Choose Configuration options, and then Configure Cluster to configure the hosts145.ans file. 1. Specify the directory in which the hosts145.ans will be configured: Select the Configure a hosts145.ans file in a directory you specify option and click OK. Enter a working directory. Click OK. Enter the system name (from Step 1) in the Machine hostname field and click Add. On the next dialog box, enter the system type in the Machine type drop-down, and the number of processors in the Max number of jobs/processors field and click OK for each machine in the cluster. When you are finished adding machines, click Close, then Exit. An example hosts145.ans file where machine1 has 2 processors and machine2 has 4 processors would look like this:
machine1 intel 0 2 0 0 MPI 1 1 machine2 intel 0 4 0 0 MPI 1 1
2.
Linux: /ansys_inc/v145/ansys/bin/ans_admin145 Choose ANSYS/Workbench Configuration, and then click Configure Cluster. Under Select file to configure, choose the hosts145.ans file to be configured and choose Configure for Distributed ANSYS. Click OK. Then enter the system name (from Step 1) in the Machine hostname field and click Add. On the next dialog box, enter the system type in the Machine type drop-down, and the number of processors in the Max number of jobs/processors field for each machine in the cluster. Click Add. When you are finished adding machines, click Close. The hosts145.ans should be located in your current working directory, your home directory, or the apdl directory.
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
21
Using Distributed ANSYS 4. Windows only: Verify that all required environment variables are properly set. If you followed the postinstallation instructions described above for Microsoft HPC Pack (Windows HPC Server 2008), these variable should be set automatically. On the head node, where ANSYS is installed, check these variables: ANSYS145_DIR=C:\Program Files\ANSYS Inc\v145\ansys ANSYSLIC_DIR=C:\Program Files\ANSYS Inc\Shared Files\Licensing where C:\Program Files\ANSYS Inc is the location of the product install and C:\Program Files\ANSYS Inc\Shared Files\Licensing is the location of the licensing install. If your installation locations are different than these, specify those paths instead. On Windows systems, you must use the Universal Naming Convention (UNC) for all ANSYS environment variables on the compute nodes for Distributed ANSYS to work correctly. On the compute nodes, check these variables: ANSYS145_DIR=\\head_node_machine_name\ANSYS Inc\v145\ansys ANSYSLIC_DIR=\\head_node_machine_name\ANSYS Inc\Shared Files\Licensing For Distributed LS-DYNA: On the head node and the compute nodes, set LSTC_LICENSE to ANSYS. This tells the LS-DYNA executable to use ANSYS licensing. Since the LS-DYNA run will use ANSYS licensing for LS-DYNA, you do not need to set LSTC_LICENSE_SERVER. 5. Windows only: Share out the ANSYS Inc directory on the head node with full permissions so that the compute nodes can access it.
22
Configuring Distributed ANSYS of the default remote shell (rsh). Note that selecting the Use Secure Shell instead of Remote Shell option on the launcher will override MPI_REMSH, if MPI_REMSH is not set or is set to a different location. You can also issue the - usessh command line option to use ssh instead of rsh. The command line option will override the environment variable setting as well. MPI_WORKDIR - Set this environment variable to specify a working directory on either the master and all nodes, or on specific nodes individually. For more information, see Controlling Files that Distributed ANSYS Writes. MPI_IC_ORDER - Set this environment variable to specify the order in which the interconnects on the system are to be used. The interconnects will be tried in the order listed from left to right. If an interconnect is listed in uppercase, no interconnects listed after that one will be tried. If MPI_IC_ORDER is not set, the fastest interconnect available on the system is used. See the Platform MPI documentation for more details. MPI_ICLIB_<interconnect> - Set this environment variable to the interconnect location if the interconnect is not installed in the default location:
setenv MPI_ICLIB_GM <path>/lib64/libgm.so
See the Platform MPI documentation for the specific interconnect names (e.g., MPI_ICLIB_GM). MPIRUN_OPTIONS - Set this environment variable to -prot to display a grid of interconnects among the systems being used for distributed processing. On Linux systems running Intel MPI: Issue the command line option -usessh to use ssh instead of rsh. See the Intel MPI reference manual (for Linux) for further information and additional environment variables and their settings: http://software.intel.com/en-us/articles/intel-mpi-library-documentation/. To verify that these environment variables are set correctly on each machine, run:
rsh machine1 env
On Windows systems, you can set the following environment variables to display the actual mpirun command issued from ANSYS: ANS_SEE_RUN = TRUE ANS_CMD_NODIAG = TRUE
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
23
Using Distributed ANSYS For Intel MPI, issue the following command:
mpitest145 -mpi intelmpi -machines machine1:2
You can use any of the same command line arguments (such as -machines) with the mpitest program as you can with Distributed ANSYS. On Windows: Issue the following command to run a local test on Windows using Platform MPI:
ansys145 -np 2 -mpitest
Use the following procedure to run a distributed test on Windows using Platform MPI: 1. 2. Create a file named machines in your local/home directory. Open the machines file in an editor. Add your master and slave machines in your cluster. For example, in this cluster of two machines, the master machine is gowindows1. List the machine name separately for each core on that machine. For example, if gowindows1 has four processors and gowindows2 has two, the machines file would look like this: gowindows1 gowindows1 gowindows1 gowindows1 gowindows2 gowindows2 3. From a command prompt, navigate to your working directory. Run the following:
ansys145 -mpifile machines -mpitest
24
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
Activating Distributed ANSYS Hardware for specific types of interconnects is generally incompatible with other proprietary interconnect types (except Ethernet and GiGE). Systems can have a network of several different types of interconnects. Each interconnect must be assigned a unique hostname and IP address. On Windows x64 systems, use the Network Wizard in the Compute Cluster Administrator to configure your interconnects. See the Compute Cluster Pack documentation for specific details on setting up the interconnects. You may need to ensure that Windows Firewall is disabled for Distributed ANSYS to work correctly.
2. 3.
Select the correct environment and license. Go to the High Performance Computing Setup tab. Select Use Distributed Computing (MPP).
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
25
Using Distributed ANSYS Specify the MPI type to be used for this distributed run. MPI types include: MS MPI (Windows 64-bit only) PCMPI (Platform MPI) Intel MPI If you choose MS MPI, you cannot specify multiple hosts or an MPI file. All other platforms allow only one type of MPI. See Table 4.2: Platforms and MPI Software (p. 18) for the specific MPI version for each platform. Choose whether you want to run on a local machine, specify multiple hosts, or specify an existing MPI file (such as a host.list or a Platform MPI appfile): If local machine, specify the number of processors on that machine. If multiple hosts, select the machines you want to use from the list of available hosts. The list of available hosts is populated from the hosts145.ans file. Click on the machines you want to use and click Add to move them to the Selected Hosts list to use them for this run. If you click Add more than once for a machine, the number of processors for that machine will increment each time, up to the maximum allowed in the hosts145.ans file. (Note that the Select Multiple Hosts option is not available when running the LS-DYNA MPP version on a Windows system.) You can also add or remove a host, but be aware that adding or removing a host from here will modify only this run; the hosts145.ans file will not be updated with any new information from this dialog box. If specifying an MPI file, type in the full path to the file, or browse to the file. If typing in the path, you must use the absolute path. Additional Options for Linux systems using Platform MPI On these systems, you can choose to use secure shell (SSH) instead of remote shell (RSH). This option will override MPI_REMSH, if the path to SSH is different. See Optional Setup Tasks (p. 22) for more information on MPI_REMSH. ANSYS uses RSH as the default, whereas Platform MPI uses SSH as the default. If you are using the launcher, you can select the Use launcher-specified working directory on all nodes option on the High Performance Computing Setup tab. This option uses the working directory as specified on the File Management tab as the directory structure on the master and all nodes. If you select this option, all machines will require the identical directory structure matching the working directory specified on the launcher. This option will override any existing MPI_WORKDIR settings on the master or the nodes. 4. Click Run to launch ANSYS.
Activating Distributed ANSYS Running on a Local Host If you are running Distributed ANSYS locally (i.e., running across multiple processors on a single machine), you need to specify the number of processors:
ansys145 -dis -np n
You may also need to specify the MPI software using the -mpi command line option: -mpi pcmpi : Platform MPI (default) -mpi intelmpi : Intel MPI If you are using Platform MPI, you do not need to specify the MPI software via the command line option. To specify Intel MPI, use the -mpi intelmpi command line option as shown below:
ansys145 -dis -mpi intelmpi -np n
For example, if you run a job in batch mode on a local host using four cores with an input file named input1 and an output file named output1, the launch commands for Linux and Windows would be as shown below. On Linux:
ansys145 -dis -np 4 -b < input1 > output1 (for default Platform MPI)
or
ansys145 -dis -mpi intelmpi -np 4 -b < input1 > output1 (for Intel MPI)
or
ansys145 -dis -mpi intelmpi -np 4 -b -i input1 -o output1 (for Intel MPI)
Running on Multiple Hosts If you are running Distributed ANSYS across multiple hosts, you need to specify the number of cores on each machine:
ansys145 -dis -machines machine1:np:machine2:np:machine3:np
You may also need to specify the MPI software. The default is Platform MPI. To specify Intel MPI, use the -mpi command line option as shown below:
ansys145 -dis -mpi intelmpi -machines machine1:np:machine2:np:machine3:np
For example, if you run a job in batch mode using two machines (one with four cores and one with two cores), with an input file named input1 and an output file named output1, the launch commands for Linux and Windows would be as shown below. On Linux:
ansys145 -dis -b -machines machine1:4:machine2:2 < input1 > output1 (for default Platform MPI)
or
ansys145 -dis -mpi intelmpi -b -machines machine1:4:machine2:2 < input1 > output1 (for Intel MPI)
or
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
27
The first machine specified with -machines in a Distributed ANSYS run must be the host machine and must contain any files necessary for the initiation of your job (i.e., input file, database file, etc.). If both the -np and -machines options are used on a single command line, the -np will be ignored. Note that the -machines option is not available when running the LS-DYNA MPP version on a Windows system.
For an Intel MPI appfile, include the -mpi command line option:
ansys145 -dis -mpi intelmpi -mpifile appfile_name
The format of the appfile is system-dependent. If the file is not in the current working directory, you will need to include the full path to the file. The file must reside on the local machine. You cannot use the -mpifile option in conjunction with the -np (local host) or -machines (multiple hosts) options.
28 Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
Activating Distributed ANSYS If the Specify Multiple Hosts launcher option or the -machines command line option was used, ANSYS generates a default appfile named host.list. You can rename this file, move it, or modify it for future runs if necessary. See the documentation for your vendor/MPI type for details on working with the appfile. Using the Platform MPI appfile Platform MPI uses an appfile to define the machines in the array (or the local host). A typical Platform MPI appfile might look like this:
-h mach1 -np 2 /ansys_inc/v145/ansys/bin/ansysdis145 -dis -h mach2 -np 2 /ansys_inc/v145/ansys/bin/ansysdis145 -dis
See the Platform MPI user documentation for details on working with the Platform MPI appfile. Using the Intel MPI appfile Intel MPI uses an appfile to define the machines in the array (or the local host). A typical Intel MPI appfile might look like this:
-host mach1 -host mach2 -env env_vars env_var_settings -np 2 /ansys_inc/v145/ansys/bin/ansysdis145 -dis -mpi INTELMPI -env env_vars env_var_settings -np 2 /ansys_inc/v145/ansys/bin/ansysdis145 -dis -mpi INTELMPI
29
Using Distributed ANSYS All working directories that are specified must exist before running a job. Linux systems with Intel MPI: By default, you must have identical working directory structures set up on the master and all slave machines. If you dont, Distributed ANSYS will fail to launch. Distributed ANSYS will always use the current working directory on the master machine and will expect identical directory structures to exist on all slave nodes. If you are using the launcher, the working directory specified on the File Management tab is the directory that Distributed ANSYS will expect. Windows systems with Microsoft MPI, Platform MPI, Intel MPI: By default, you must have identical working directory structures set up on the master and all slave machines. If you dont, Distributed ANSYS will fail to launch. Distributed ANSYS will always use the current working directory on the master machine and will expect identical directory structures to exist on all slave nodes. If you are using the launcher, the working directory specified on the File Management tab is the directory that Distributed ANSYS will expect.
30
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
Supported Analysis Types and Features Harmonic analyses using the full, Variational Technology, or Variational Technology perfect absorber method (HROPT,FULL; VT; VTPA; AUTO). Transient dynamic analyses using the full or Variational Technology method (TRNOPT,FULL; VT). Radiation analyses using the radiosity method. Low-frequency electromagnetic analysis using only the following elements: SOLID96, SOLID97, SOLID122, SOLID123, SOLID231, SOLID232, SOLID236, SOLID237, and SOURC36 (when used in a model with the above elements only). High-frequency electromagnetic analysis using elements HF118, HF119, and HF120 with KEYOPT(1) = 0 or 1 (first order element option). Coupled-field analyses using only the following elements: PLANE223, SOLID226, SOLID227. Superelements in the use pass of a substructuring analysis. Cyclic symmetry analyses. The following analysis types are supported and use distributed-memory parallelism throughout the Distributed ANSYS solution, except for the equation solver which uses shared-memory parallelism. (In these cases, the solution runs in mixed mode.) Static and full transient analyses (linear or nonlinear) that use the JCG or ICCG equation solvers. Note that when the JCG equation solver is used in these analysis types, the JCG solver will actually run using distributed-memory parallelism (that is, pure DMP mode) if the matrix is symmetric and the fast thermal option (THOPT,QUASI; LINEAR) is not being used. Buckling analyses using the Block Lanczos eigensolver (BUCOPT,LANB). Modal analyses using the Block Lanczos or Supernode eigensolver (MODOPT,LANB; SNODE). Full harmonic analyses using the JCG, ICCG, or QMR equation solvers. The following analysis types are supported but do not use distributed-memory parallelism within Distributed ANSYS. (The solution runs in pure SMP mode.) Modal analyses using the QRDAMP eigensolver (MODOPT,QRDAMP). Harmonic analyses using the mode superposition method (HROPT,MSUP). Transient dynamic analyses using the mode superposition method (TRNOPT,MSUP). Substructure analyses involving the generation pass or expansion pass. Spectrum analyses. Expansion passes for reduced analyses. Blocked Analysis Types FLOTRAN analyses.
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
31
32
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
Understanding the Working Principles and Behavior of Distributed ANSYS Variational Technology options on the STAOPT and TRNOPT commands. ANSYS Multi-field solver - multiple code coupling (MFX). ANSYS Multi-field solver - single code (MFS). Arc-length method (ARCLEN). Inertia relief (IRLF,1). Probabilistic design (/PDS commands). Element morphing. Automatic substructuring.
33
Using Distributed ANSYS process will save and resume the Jobname.DB file. If a non-default file name is specified (e.g., /ASSIGN), then Distributed ANSYS behaves the same as shared-memory ANSYS. After a parallel solution successfully completes, Distributed ANSYS automatically merges some of the (local) Jobnamen.EXT files into a single (global) file named Jobname.EXT. These include the .RST (or .RTH), .ESAV, .EMAT, .MODE, .IST, .MLV, and .SELD files. This action is performed when the FINISH command is executed upon leaving the solution processor. These files contain the same information about the final computed solution as files generated for the same model computed with shared-memory ANSYS. Therefore, all downstream operations (such as postprocessing) can be performed using shared-memory ANSYS (or in the same manner as shared-memory ANSYS) by using these global files. If any of these global Jobname.EXT files are not needed for downstream operations, you can reduce the overall solution time by suppressing the file combination for individual file types (see the DMPOPTION command for more information). If it is later determined that a global Jobname.EXT file is needed for a subsequent operation or analysis, the local files can be combined by using the COMBINE command. Distributed ANSYS will not delete most files written by the slave processes when the analysis is completed. If you choose, you can delete these files when your analysis is complete (including any restarts that you may wish to perform). If you do not wish to have the files necessary for a restart saved, you can issue RESCONTROL,NORESTART. File copy, delete, and rename operations can be performed across all processes by using the DistKey option on the /COPY, /DELETE, and /RENAME commands. This provides a convenient way to manage local files created by a distributed parallel solution. For example, /DELETE,Fname,Ext,,ON automatically appends the process rank number to the specified file name and deletes Fnamen.Ext from all processes. See the /COPY, /DELETE, and /RENAME command descriptions for more information. Batch and Interactive Mode You can launch Distributed ANSYS in either interactive or batch mode for the master process. However, the slave processes are always in batch mode. The slave processes cannot read the START145.ANS or STOP145.ANS files. The master process sends all /CONFIG,LABEL commands to the slave processes as needed. On Windows systems, there is no ANSYS output console window when running the Distributed ANSYS GUI (interactive mode). All standard output from the master process will be written to a file named file0.out. (Note that the jobname is not used.) Output Files When a Distributed ANSYS job is executed, the output for the master process is written in the same fashion as shared-memory ANSYS. In other
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
34
Understanding the Working Principles and Behavior of Distributed ANSYS words, by default the output is written to the screen, or if you specified an output file via the launcher or the -o command line option, the output for the master process is written to that file. Distributed ANSYS automatically outputs the ASCII files from each slave process to Jobnamen.OUT. Normally, these slave process output files have little value because all of the relevant job information is written to the screen (or master process or output file). Error Handling The same principle also applies to the error file Jobnamen.ERR. When a warning or error occurs on one of the slave processes during the Distributed ANSYS solution, the process writes that warning or error message to its error file and then communicates the warning or error message to the master process. Typically, this allows the master process to write the warning or error message to its error file and output file and, in the case of an error message, allows all of the Distributed ANSYS processes to exit the program simultaneously. In some cases, an error message may fail to be fully communicated to the master process. If this happens, you can view each Jobnamen.ERR and/or Jobnamen.OUT file in an attempt to learn why the job failed. In some rare cases, the job may hang. When this happens, you must manually kill the processes; the error files and output files written by all the processes will be incomplete but may still provide some useful information as to why the job failed. Use of APDL In pre- and postprocessing, APDL works the same in Distributed ANSYS as in shared-memory ANSYS. However, in the solution processor (/SOLU), Distributed ANSYS does not support certain *GET items. In general, Distributed ANSYS supports global solution *GET results such as total displacements and reaction forces. It does not support element level results specified by ESEL, ESOL, and ETABLE labels. Unsupported items will return a *GET value of zero. Multiple commands entered in an ANSYS input file can be condensed into a single line if the commands are separated by the $ character (see Condensed Data Input in the Command Reference). Distributed ANSYS cannot properly handle condensed data input. Each command must be placed on its own line in the input file for a Distributed ANSYS run.
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
35
Using Distributed ANSYS ANSYS works on the entire model again (that is, it behaves like sharedmemory ANSYS). Print Output (OUTPR Command) In Distributed ANSYS, the OUTPR command prints NSOL and RSOL in the same manner as in shared-memory ANSYS. However, for other items such as ESOL, Distributed ANSYS prints only the element solution on the CPU domain of the master process. Therefore, OUTPR, ESOL has incomplete information and is not recommended. Also, the order of elements is different from that of shared-memory ANSYS due to domain decomposition. A direct one-to-one element comparison with shared-memory ANSYS will be different if using OUTPR. In all HPC products (both shared-memory ANSYS and Distributed ANSYS), the program can handle a large number of coupling and constraint equations (CE/CP) and contact elements. However, specifying too many of these items can force Distributed ANSYS to communicate more data among each process, resulting in longer elapsed time to complete a distributed parallel job. You should reduce the number of CE/CP if possible and make potential contact pairs in a smaller region to achieve nondeteriorated performance. In addition, for assembly contact pairs or small sliding contact pairs, you can use the command CNCHECK,TRIM to remove contact and target elements that are initially in far-field (open and not near contact). This trimming option will help to achieve better performance in Distributed ANSYS runs.
36
Understanding the Working Principles and Behavior of Distributed ANSYS can use the COMBINE command to combine the local results files into a single, global results file.
then, for the multiframe restart you must also use 8 cores (-dis -np 8), and the files from the original analysis that are required for the restart must be located in the current working directory. When running across machines, the job launch procedure (or script) used when restarting Distributed ANSYS must not be altered following the first load step and first substep. In other words, you must use the same number of machines, the same number of cores for each of the machines, and the same host (master) and slave relationships among these machines in the restart job that follows. For example, if you use the following command line for the first load step:
ansys145 dis machines mach1:4:mach2:1:mach3:2 i input o output1
where the host machine (which always appears first in the list of machines) is mach1, and the slave machines are mach2 and mach3. Then for the multiframe restart, you must use a command line such as this:
ansys145 dis machines mach7:4:mach6:1:mach5:2 i restartjob o output2
which uses the same number of machines (3), same number of cores for each machine in the list (4:1:2), and the same host/slave relationship (4 cores on host, 1 core on first slave, and 2 cores on second slave). Any alterations in the -machines field, other than the actual machine names, will result in restart failure. Finally, the files from the original analysis that are required for the restart must be located in the current working directory on each of the machines. The files needed for a restart must be available on the machine(s) used for the restarted analysis. Each machine has its own restart files that are written from the previous run. The restart process needs to use these files to perform the correct restart actions. For the first example above, if the two analyses (-dis np 8) are performed in the same working directory, no action is required; the restart files will already be available. However, if the restart is performed in a new directory, all of the restart files listed in Table 4.4: Required Files for Multiframe Restarts (p. 38) must be copied (or moved) into the new directory before performing the multiframe restart. For the second example above, the restart files listed in the Host Machine column in Table 4.4: Required Files for Multiframe Restarts (p. 38) must be copied (or moved) from mach1 to mach7, and
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
37
Using Distributed ANSYS all of the files in the Slave Machines column must be copied (or moved) from mach2 to mach6 and from mach3 to mach5 before performing the multiframe restart. Table 4.4: Required Files for Multiframe Restarts Host Machine Jobname.LDHI Jobname.RDB Jobname0.Xnnn [1] Jobname0.RST (this is the local .RST file for this domain) Jobnamen.Xnnn Jobnamen.RST (this is the local .RST file for this domain) Slave Machines ---
1. The .Xnnn file extension mentioned here refers to the .Rnnn and .Mnnn files discussed in Multiframe Restart Requirements in the Basic Analysis Guide. In all restarts, the result file Jobname.RST (or Jobname.RTH or Jobname.RMG) on the host machine is recreated after each solution by merging the Jobnamen.RST again. If you do not require a restart, issue RESCONTROL,NORESTART in the run to remove or to avoid writing the necessary restart files on the host and slave machines. If you use this command, the slave processes will not have files such as .ESAV, .OSAV, .RST, or .X000, in the working directory at the end of the run. In addition, the host process will not have files such as .ESAV, .OSAV, .X000, .RDB, or .LDHI at the end of the run. The program will remove all of the above scratch files at the end of the solution phase (FINISH or /EXIT). This option is useful for file cleanup and control.
38
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
Example Problems
The mpitest program should start without errors. If it does not, check your paths, .rhosts file, and permissions; correct any errors; and rerun. Part B: Setup and Run a Distributed Solution 1. 2. 3. Set up identical installation and working directory structures on all machines (master and slaves) in the cluster. Install ANSYS 14.5 on the master machine, following the typical installation process. Install ANSYS 14.5 on the slave machines. Steps 2 and 3 above will install all necessary components on your machines, including Platform MPI 8.2. 4. 5. Type hostname on each machine in the cluster. Note the name of each machine. You will need this name to set up both the .rhosts file and the Configure Cluster option of the ANS_ADMIN145 utility. Set up the .rhosts file on each machine. The .rhosts file lists each machine in the cluster, followed by your username. The machines should be listed using their complete system name, as taken from uname. For example, each .rhosts file for our two-machine cluster looks like this (where golinux1 and golinux2 are example machine names, and jqd is an example username):
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
39
6.
7. 8.
Verify communication between machines via rsh. If the communication between machines is happening correctly, you will not need a password. Run the ANS_ADMIN145 utility:
/ansys_inc/v145/ansys/bin/ans_admin145
9.
Choose ANSYS / Workbench Configuration, and then click Configure Cluster. Under Select file to configure, choose the hosts145.ans file to be configured and choose Configure for Distributed ANSYS. Click OK. Then enter the system name (from Step 4) in the Machine hostname field and click Add. On the next dialog box, enter the system type in the Machine type drop-down, and the number of processors in the Max number of jobs/processors field for each machine in the cluster. Click OK. When you are finished adding machines, click Close and then click Exit to leave the ANS_ADMIN145 utility. The resulting hosts145.ans file using our example machines would look like this:
golinux1 linem64t 0 1 0 0 /home/jqd MPI 1 1 golinux2 linem64t 0 1 0 0 /home/jqd MPI 1 1
11. Select the correct environment and license. 12. Go to the High Performance Computing Setup tab. Select Use Distributed Computing (MPP). You must also specify either local machine or multiple hosts. For multiple hosts, select the machines you want to use from the list of available hosts. The list of available hosts is populated from the hosts145.ans file. Click on the machines you want to use and click Add to move them to the Selected Hosts list to use them for this run. You can also add or remove a host, but be aware that adding or removing a host from here will modify only this run; the hosts145.ans file will not be updated with any new information from this dialog box. If necessary, you can also run secure shell (ssh) by selecting Use Secure Shell instead of Remote Shell (ssh instead of rsh). 13. Click Run to launch ANSYS. 14. In ANSYS, select File>Read Input From and navigate to tutor1_carrier_linux.inp or tutor2_carrier_modal.inp. 15. The example will progress through the building, loading, and meshing of the model. When it stops, select Main Menu>Solution>Analysis Type>Sol'n Controls. 16. On the Solution Controls dialog box, click on the Sol'n Options tab.
40
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
Example Problems 17. Select the Pre-Condition CG solver. 18. Click OK on the Solution Controls dialog box. 19. Solve the analysis. Choose Main Menu>Solution>Solve>Current LS. Click OK. 20. When the solution is complete, you can postprocess your results as you would with any analysis. For example, you could select Main Menu>General Postproc>Read Results>First Set and select the desired result item to display.
6.
7. 8.
41
Using Distributed ANSYS 9. Start ANSYS using the launcher: Start >Programs >ANSYS 14.5 > Mechanical APDL Product Launcher 14.5.
10. Select ANSYS Batch as the Simulation Environment, and choose a license. Specify tutor1_carrier_win.inp or tutor2_carrier_modal.inp as your input file. Both of these examples use the PCG solver. You must specify your working directory to be the location where this file is located. 11. Go to the High Performance Computing Setup tab. Select Use Distributed Computing (MPP). You must specify either local machine or multiple hosts. For multiple hosts, select the machines you want to use from the list of available hosts. The list of available hosts is populated from the hosts145.ans file. Click on the machines you want to use and click Add to move them to the Selected Hosts list to use them for this run. Click on a machine in Selected Hosts and click Edit if you wish to add multiple processors for that host. You can also add or remove a host, but be aware that adding or removing a host from here will modify only this run; the hosts145.ans file will not be updated with any new information from this dialog box. 12. Click Run. 13. When the solution is complete, you can postprocess your results as you would with any analysis.
4.6. Troubleshooting
This section describes problems which you may encounter while using Distributed ANSYS, as well as methods for overcoming these problems. Some of these problems are specific to a particular system, as noted.
Troubleshooting control node is included in its own .rhosts file. If you run on Linux, be sure you have run the following command:
chmod 600 .rhosts
MPI: could not run executable If you encounter this message, verify that you have the correct version of MPI and that it is installed correctly, and verify that you have a .rhosts file on each machine. If not, create a .rhosts file on all machines where you will run Distributed ANSYS, make sure the permissions on the file are 600, and include an entry for each hostname where you will run Distributed ANSYS. Error executing ANSYS. Refer to System-related Error Messages in the ANSYS online help. If this was a Distributed ANSYS job, verify that your MPI software is installed correctly, check your environment settings, or check for an invalid command line option. You may encounter the above message when setting up Platform MPI or running Distributed ANSYS using Platform MPI on a Windows platform. This may occur if you did not correctly run the set password bat file. Verify that you completed this item according to the Platform MPI installation readme instructions. You may also see this error if Ansys Inc\v145\ansys\bin\<platform> (where <platform> is intel or winx64) is not in your PATH. If you need more detailed debugging information, use the following: 1. Open a Command Prompt window and set the following: SET ANS_SEE_RUN=TRUE SET ANS_CMD_NODIAG=TRUE 2. Run the following command line: ansys145 -b -dis -i myinput.inp -o myoutput.out.
Distributed ANSYS fails to launch when running from a fully-qualified pathname. Distributed ANSYS will fail if the ANSYS installation path contains a space followed by a dash if %ANSYS145_DIR%\bin\<platform> (where <platform> is intel or winx64) is not in the system PATH. Add %ANSYS145_DIR%\bin\<platform> to the system PATH and invoke ansys145 (without the fully qualified pathname). For example, if your installation path is:
C:\Program Files\Ansys -Inc\v145\bin\<platform>
However, if you add C:\Program Files\Ansys -Inc\v145\bin\<platform> to the system PATH, you can successfully launch Distributed ANSYS by using the following command:
ansys145 -g
The required licmsgs.dat file, which contains licensing-related messages, was not found or could not be opened. The following path was determined using environment variable ANSYS145_DIR. This is a fatal error - - exiting. Check the ANSYS145_DIR environment variable to make sure it is set properly. Note that for Windows HPC clusters, the ANSYS145_DIR environment variable should be set to \\HEADNODE\Ansys
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
43
Using Distributed ANSYS Inc\v145\ansys, and the ANSYSLIC_DIR environment variable should be set to \\HEADNODE\Ansys Inc\Shared Files\Licensing on all nodes. mpid: Cannot set work directory: No such file or directory mpid: MPI_WORKDIR=<dir> You will see this message if you set the MPI_WORKDIR environment variable but the specified directory doesn't exist. If you set MPI_WORKDIR on the master, this message could also appear if the directory doesn't exist on one of the slaves. In Distributed ANSYS, if you set the MPI_WORKDIR environment variable on the master node, Distributed ANSYS will expect all slave nodes to have the same directory. WARNING: Unable to read mpd.hosts or list of hosts isn't provided. MPI job will be run on the current machine only. (Intel MPI) When using Intel MPI, you must have an mpd.hosts file in your working directory when going across boxes that contain a line for each box. Otherwise, you will encounter the following error. WARNING: Unable to read mpd.hosts or list of hosts isn't provided. MPI job will be run on the current machine only. mpiexec: unable to start all procs; may have invalid machine names remaining specified hosts: xx.x.xx.xx (hostname) mpid: MPI BUG: requested interconnect not available The default locations for GM (Myrinet) and its libraries are /opt/gm and /opt/gm/lib, respectively. If the libraries are not found, you may see this message. To specify a different location for the GM libraries, set the MPI_ICLIB_GM environment variable:
setenv MPI_ICLIB_GM <path>/lib/libgm.so
AMD Opteron and other 64-bit systems may have a specific 64-bit library subdirectory, /lib64. On these systems, you need to point to this location:
setenv MPI_ICLIB_GM <path>/lib64/libgm.so
Note
The environment variable needs to be set on each system (such as in the .cshrc or .login files).
44
Troubleshooting Be sure to kill any lingering processes (Linux: type kill -9 from command level; Windows: use Task Manager) on all processors and start the job again. Job fails with SIGTERM signal (Linux Only) Occasionally, when running on Linux, a simulation may fail with a message like the following: MPI Application rank 2 killed before MPI_Finalize() with signal 15 forrtl: error (78): process killed (SIGTERM) This typically occurs when computing the solution and means that the system has killed the ANSYS process. The two most common occurrences are (1) ANSYS is using too much of the hardware resources and the system has killed the ANSYS process or (2) a user has manually killed the ANSYS job (i.e., kill -9 system command). Users should check the size of job they are running in relation to the amount of physical memory on the machine. Most often, decreasing the model size or finding a machine with more RAM will result in a successful run. Poor Speedup or No Speedup As more cores are utilized, the runtimes are generally expected to decrease. The biggest relative gains are typically achieved when using two cores compared to using a single core. When significant speedups are not seen as additional cores are used, the reasons may involve both hardware and software issues. These include, but are not limited to, the following situations. Hardware Oversubscribing hardware In a multiuser environment, this could mean that more physical cores are being used by ANSYS simulations than are available on the machine. It could also mean that hyperthreading is activated. Hyperthreading typically involves enabling extra virtual cores, which can sometimes allow software programs to more effectively use the full processing power of the CPU. However, for compute-intensive programs such as ANSYS, using these virtual cores rarely provides a significant reduction in runtime. Therefore, it is recommended you do not use hyperthreading when running the ANSYS program; if hyperthreading is enabled, it is recommended you do not exceed the number of physical cores. Lack of memory bandwidth On some systems, using most or all of the available cores can result in a lack of memory bandwidth. This lack of memory bandwidth can impact the overall scalability of the ANSYS software. Slow interconnect speed When running Distributed ANSYS across multiple machines, the speed of the interconnect (GigE, Myrinet, Infiniband, etc.) can have a significant impact on the performance. Slower interconnects cause each Distributed ANSYS process to spend extra time waiting for data to be transferred from one machine to another. This becomes especially important as more machines are involved in the simulation. See Interconnect Configuration for the recommended interconnect speed. Software Simulation includes non-supported features The shared and distributed-memory parallelisms work to speed up certain compute-intensive operations in /PREP7, /SOLU and /POST1. However, not all operations are parallelized. If a particular operation that is not parallelized dominates the simulation time, then using additional cores will not help achieve a faster runtime. Simulation has too few DOF (degrees of freedom) Some analyses (such as transient analyses) may require long compute times, not because the number of DOF is large, but because a large number of calculations are performed (i.e., a very large number of time steps). Generally, if the number of DOF is relatively small, parallel processing will not significantly decrease the solution
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
45
Using Distributed ANSYS time. Consequently, for small models with many time steps, parallel performance may be poor because the model size is too small to fully utilize a large number of cores. I/O cost dominates solution time For some simulations, the amount of memory required to obtain a solution is greater than the physical memory (i.e., RAM) available on the machine. In these cases, either virtual memory (i.e., hard disk space) is used by the operating system to hold the data that would otherwise be stored in memory, or the equation solver writes extra files to the disk to store data. In both cases, the extra I/O done using the hard drive can significantly impact performance, making the I/O performance the main bottleneck to achieving optimal performance. In these cases, using additional cores will typically not result in a significant reduction in overall time to solution. Large contact pairs For simulations involving contact pairs with a large number of elements relative to the total number of elements in the entire model, the performance of Distributed ANSYS is often negatively impacted. These large contact pairs require Distributed ANSYS to do extra communication and often cause a load imbalance between each of the cores (i.e., one core might have two times more computations to perform than another core). In some cases, using CNCHECK,TRIM can help trim any unnecessary contact/target elements from the larger contact pairs. In other cases, however, manual interaction will be required to reduce the number of elements involved in the larger contact pairs. Different Results Relative to a Single Core Distributed-memory parallel processing initially decomposes the model into domains. Typically, the number of domains matches the number of cores. Operational randomness and numerical round-off inherent to parallelism can cause slightly different results between runs on the same machine(s) using the same number of cores or different numbers of cores. This difference is often negligible. However, in some cases the difference is appreciable. This sort of behavior is most commonly seen on nonlinear static or transient analyses which are numerically unstable. The more numerically unstable the model is, the more likely the convergence pattern or final results will differ as the number of cores used in the simulation is changed.
46
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
Index
Symbols
.rhosts file, 20
A
analysis type supported by Distributed ANSYS, 30 supported by GPU accelerator capability, 11
activating, 5 considerations for specific systems, 6 GPU accelerator capability, 9 setting number of processors, 5 shared-memory ANSYS, 5 troubleshooting, 42 prerequisites for Distributed ANSYS, 17
S
shared-memory ANSYS activating, 5 overview, 5 troubleshooting, 6 starting Distributed ANSYS, 25
D
Distributed ANSYS activating, 25 configuring, 17 installing, 19 Microsoft HPC Pack, 20 overview, 1, 15 prerequisites for, 17 required files, 20 setting up the environment, 20 supported analysis types, 30 supported features, 30 testing, 23 troubleshooting, 42
T
troubleshooting Distributed ANSYS, 42 GPU accelerator capability, 12 shared-memory ANSYS, 6
G
GPU accelerator capability, 9 activating, 10 overview, 9 supported analysis types, 11 supported features, 11 troubleshooting, 12 graphics processing unit, 9
H
High Performance Computing activating, 5 GPU accelerator capability, 9 setting up the environment, 20 hosts145.ans file, 20 HPC licensing, 1
M
Microsoft HPC Pack installing for Distributed ANSYS, 20 MPI software, 18 mpitest program, 23
P
parallel processing
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.
47
48
Release 14.5 - SAS IP, Inc. All rights reserved. - Contains proprietary and confidential information of ANSYS, Inc. and its subsidiaries and affiliates.