Matlab Manual
Matlab Manual
R2012b
Product enhancement suggestions Bug reports Documentation error reports Order status, license renewals, passcodes Sales, pricing, and general information
508-647-7000 (Phone) 508-647-7001 (Fax) The MathWorks, Inc. 3 Apple Hill Drive Natick, MA 01760-2098
For contact information about worldwide offices, see the MathWorks Web site. MATLAB Distributed Computing Server System Administrators Guide COPYRIGHT 20052012 by The MathWorks, Inc.
The software described in this document is furnished under a license agreement. The software may be used or copied only under the terms of the license agreement. No part of this manual may be photocopied or reproduced in any form without prior written consent from The MathWorks, Inc. FEDERAL ACQUISITION: This provision applies to all acquisitions of the Program and Documentation by, for, or through the federal government of the United States. By accepting delivery of the Program or Documentation, the government hereby agrees that this software or documentation qualifies as commercial computer software or commercial computer software documentation as such terms are used or defined in FAR 12.212, DFARS Part 227.72, and DFARS 252.227-7014. Accordingly, the terms and conditions of this Agreement and only those rights specified in this Agreement, shall pertain to and govern the use, modification, reproduction, release, performance, display, and disclosure of the Program and Documentation by the federal government (or other entity acquiring for or through the federal government) and shall supersede any conflicting contractual terms or conditions. If this License fails to meet the governments needs or is inconsistent in any respect with federal procurement law, the government agrees to return the Program and Documentation, unused, to The MathWorks, Inc.
Trademarks
MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See www.mathworks.com/trademarks for a list of additional trademarks. Other product or brand names may be trademarks or registered trademarks of their respective holders.
Patents
MathWorks products are protected by one or more U.S. patents. Please see www.mathworks.com/patents for more information.
Revision History
November 2005 December 2005 March 2006 September 2006 March 2007 September 2007 March 2008 October 2008 March 2009 September 2009 March 2010 September 2010 April 2011 September 2011 March 2012 September 2012
Online Online Online Online Online Online Online Online Online Online Online Online Online Online Online Online
only only only only only only only only only only only only only only only only
New for Version 2.0 (Release 14SP3+) Revised for Version 2.0 (Release 14SP3+) Revised for Version 2.0.1 (Release 2006a) Revised for Version 3.0 (Release 2006b) Revised for Version 3.1 (Release 2007a) Revised for Version 3.2 (Release 2007b) Revised for Version 3.3 (Release 2008a) Revised for Version 4.0 (Release 2008b) Revised for Version 4.1 (Release 2009a) Revised for Version 4.2 (Release 2009b) Revised for Version 4.3 (Release 2010a) Revised for Version 5.0 (Release 2010b) Revised for Version 5.1 (Release 2011a) Revised for Version 5.2 (Release 2011b) Revised for Version 6.0 (Release 2012a) Revised for Version 6.1 (Release 2012b)
Contents
Introduction
1
Product Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Key Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Product Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parallel Computing Concepts . . . . . . . . . . . . . . . . . . . . . . . . Determining Product Installation and Versions . . . . . . . . . Toolbox and Server Components . . . . . . . . . . . . . . . . . . . . Schedulers, Workers, and Clients . . . . . . . . . . . . . . . . . . . . Third-Party Schedulers . . . . . . . . . . . . . . . . . . . . . . . . . . . . Components on Mixed Platforms or Heterogeneous Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . mdce Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Parallel Computing Toolbox Software . . . . . . . . . 1-2 1-2 1-3 1-3 1-4 1-5 1-5 1-6 1-8 1-8 1-9
Network Administration
2
Prepare for Parallel Computing . . . . . . . . . . . . . . . . . . . . . Plan Your Network Layout . . . . . . . . . . . . . . . . . . . . . . . . . . Network Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fully Qualified Domain Names . . . . . . . . . . . . . . . . . . . . . . Security Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . Install and Configure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Use a Different MPI Build on UNIX Operating Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Build MPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 2-2 2-3 2-3 2-4 2-5
2-6 2-6
Use Your MPI Build . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shut Down a Job Manager Cluster . . . . . . . . . . . . . . . . . . UNIX and Macintosh Operating Systems . . . . . . . . . . . . . . Microsoft Windows Operating Systems . . . . . . . . . . . . . . . . Custom Startup Parameters . . . . . . . . . . . . . . . . . . . . . . . . Define Script Defaults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Override Script Defaults . . . . . . . . . . . . . . . . . . . . . . . . . . . . Access Service Record Files . . . . . . . . . . . . . . . . . . . . . . . . Locate Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Locate Checkpoint Folders . . . . . . . . . . . . . . . . . . . . . . . . . . Set MJS Cluster Security . . . . . . . . . . . . . . . . . . . . . . . . . . . Set the Security Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Local, MJS, and Network Passwords . . . . . . . . . . . . . . . . . . Set Secure Communication . . . . . . . . . . . . . . . . . . . . . . . . . . Troubleshoot Common Problems . . . . . . . . . . . . . . . . . . . . License Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Memory Errors on UNIX Operating Systems . . . . . . . . . . . Run Server Processes on Windows Network Installation . . Required Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ephemeral TCP Ports with Job Manager . . . . . . . . . . . . . . Host Communications Problems . . . . . . . . . . . . . . . . . . . . . Verify Multicast Communications . . . . . . . . . . . . . . . . . . . .
2-6 2-9 2-9 2-11 2-13 2-13 2-15 2-17 2-17 2-18 2-19 2-19 2-21 2-21 2-23 2-23 2-25 2-25 2-25 2-27 2-27 2-29
Product Installation
3
Install Products and Choose Cluster Configuration . . . Cluster Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Install Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Configure Your Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Configure for an MJS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 3-2 3-3 3-4 3-5
vi
Contents
Configure Cluster to Use a MATLAB Job Scheduler (MJS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Configure Windows Firewalls on Client . . . . . . . . . . . . . . . Validate Installation with MJS . . . . . . . . . . . . . . . . . . . . . . Configure for HPC Server . . . . . . . . . . . . . . . . . . . . . . . . . . Configure Cluster for Microsoft Windows HPC Server . . . . Configure Client Computer for HPC Server 2008 . . . . . . . . Validate Installation Using Microsoft Windows HPC Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Configure for Supported Third-Party Schedulers (PBS Pro, Platform LSF, TORQUE) . . . . . . . . . . . . . . . . . . . . . Configure Platform LSF Scheduler on Windows Cluster . . Configure Windows Firewalls on Client . . . . . . . . . . . . . . . Validate Installation Using an LSF, PBS Pro, or TORQUE Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Configure for a Generic Scheduler . . . . . . . . . . . . . . . . . . Interfacing with Generic Schedulers . . . . . . . . . . . . . . . . . . Configure Generic Scheduler on Windows Cluster . . . . . . . Configure Sun Grid Engine on Linux Cluster . . . . . . . . . . . Configure Windows Firewalls on Client . . . . . . . . . . . . . . . Validate Installation Using a Generic Scheduler . . . . . . . .
3-35 3-35 3-38 3-38 3-42 3-43 3-44 3-47 3-48 3-48
Admin Center
4
Start Admin Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Set Up Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Add Hosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Start mdce Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Start an MJS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Start Workers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stop, Destroy, Resume, Restart Processes . . . . . . . . . . . . . . Move a Worker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Update the Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 4-3 4-3 4-4 4-5 4-7 4-9 4-10 4-10
vii
Test Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Export and Import Sessions . . . . . . . . . . . . . . . . . . . . . . . . Prepare for Cluster Profiles . . . . . . . . . . . . . . . . . . . . . . . .
5
mdce Process Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Job Manager Control 5-2 5-3 5-4
..............................
Worker Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
Glossary
Index
viii
Contents
1
Introduction
Product Description on page 1-2 Product Overview on page 1-3 Toolbox and Server Components on page 1-5 Using Parallel Computing Toolbox Software on page 1-9
Introduction
Product Description
Perform MATLAB and Simulink computations on clusters, clouds, and grids MATLAB Distributed Computing Server lets you run computationally intensive MATLAB programs and Simulink models on computer clusters, clouds, and grids. You develop your program or model on a multicore desktop computer using Parallel Computing Toolbox and then scale up to many computers by running it on MATLAB Distributed Computing Server. The server supports batch jobs, parallel computations, and distributed large data. The server includes a built-in cluster job scheduler and provides support for commonly used third-party schedulers. MATLAB Distributed Computing Server provides licenses for all MathWorks toolboxes and blocksets, so you can run your MATLAB programs on a cluster without having to separately acquire additional product-specific licenses for each computer in the cluster.
Key Features
Access to all eligible licensed toolboxes or blocksets with a single server license on the distributed computing resource Execution of GPU-enabled functions on distributed computing resources Execution of parallel computations from applications and software components generated using MATLAB Compiler on distributed computing resources Support for all hardware platforms and operating systems supported by MATLAB and Simulink Application scheduling using a built-in job scheduler or third-party schedulers such as Platform LSF, Microsoft Windows HPC Server 2008, Altair PBS Pro, and TORQUE
1-2
Product Overview
Product Overview
In this section... Parallel Computing Concepts on page 1-3 Determining Product Installation and Versions on page 1-4
1-3
Introduction
MATLAB Worker
MATLAB Distributed Computing Server
MATLAB Client
Parallel Computing Toolbox
Scheduler
MATLAB Worker
MATLAB Distributed Computing Server
MATLAB Worker
MATLAB Distributed Computing Server
When you enter this command, MATLAB displays information about the version of MATLAB you are running, including a list of all toolboxes installed on your system and their version numbers. You can run the ver command as part of a task in a distributed or parallel application to determine what version of MATLAB Distributed Computing Server software is installed on a worker machine. Note that the toolbox and server software must be the same version.
1-4
1-5
Introduction
Worker
Client
All Results
Scheduler
Job
Worker
Client
All Results
Worker
A large network might include several MJS sessions as well as several client sessions. Any client session can create, run, and access jobs on any MJS, but a worker session is registered with and dedicated to only one MJS at a time. The following figure shows a configuration with multiple MJS processes.
Worker
Client
Scheduler 1
Worker
Worker
Worker
Client
Worker
Worker
Third-Party Schedulers
As an alternative to using the MJS, you can use a third-party scheduler. This could be a Microsoft Windows HPC Server (including CCS), Platform LSF
1-6
1-7
Introduction
Who administers your cluster? The person administering your cluster might have a preference for how jobs are scheduled.
In a mixed platform environment, be sure to follow the proper installation instructions for each local machine on which you are installing the software.
mdce Service
If you are using the MJS, every machine that hosts a worker or MJS session must also run the mdce service. The mdce service recovers worker and MJS sessions when their host machines crash. If a worker or MJS machine crashes, when mdce starts up again (usually configured to start at machine boot time), it automatically restarts the MJS and worker sessions to resume their sessions from before the system crash.
1-8
(but usually only one scheduler). The function you use to find an MJS or scheduler creates an object in your current MATLAB session to represent the MJS or scheduler that will run your job.
2 Create a Job You create a job to hold a collection of tasks. The job exists
on the MJS (or schedulers data location), but a job object in the local MATLAB session represents that job.
3 Create Tasks You create tasks to add to the job. Each task of a job can
tasks defined, you submit it to the queue in the MJS or scheduler. The MJS or scheduler distributes your jobs tasks to the worker sessions for evaluation. When all of the workers are completed with the jobs tasks, the job moves to the finished state.
5 Retrieve the Jobs Results The resulting data from the evaluation of the
1-9
Introduction
1-10
2
Network Administration
This chapter provides information useful for network administration of Parallel Computing Toolbox software and MATLAB Distributed Computing Server software. Prepare for Parallel Computing on page 2-2 Install and Configure on page 2-5 Use a Different MPI Build on UNIX Operating Systems on page 2-6 Shut Down a Job Manager Cluster on page 2-9 Custom Startup Parameters on page 2-13 Access Service Record Files on page 2-17 Set MJS Cluster Security on page 2-19 Troubleshoot Common Problems on page 2-23
Network Administration
The server software includes the mdce service or daemon. The mdce service is separate from the worker and job manager processes, and it must be
2-2
running on all machines that run job manager sessions or workers that are registered with a job manager. (The mdce service is not used with third-party schedulers.) You can install both toolbox and server software on the same machine, so that one machine can run both client and server sessions.
Network Requirements
To view the network requirements for MATLAB Distributed Computing Server software, visit the product requirements page on the MathWorks Web site at
http://www.mathworks.com/products/distriben/requirements.html
2-3
Network Administration
Security Considerations
The parallel computing products do not provide any security measures. Therefore, be aware of the following security considerations: MATLAB workers run as whatever user the administrator starts the nodes mdce service under. By default, the mdce service starts as root on UNIX operating systems, and as LocalSystem on Microsoft Windows operating systems. Because MATLAB provides system calls, users can submit jobs that execute shell commands. The mdce service does not enforce any access control or authentication. Anyone with local or remote access to the mdce services can start and stop their workers and job managers, and query for their status. The job manager does not restrict access to the cluster, nor to job and task data. Using a third-party scheduler instead of the MathWorks job manager could allow you to take advantage of the security measures it provides. The parallel computing processes must all be on the same side of a firewall, or you must take measures to enable them to communicate with each other through the firewall. Workers running tasks of the same parallel job cannot be firewalled off from each other, because their MPI-based communication will not work. If certain ports are restricted, you can specify the ports used for parallel computing. See Define Script Defaults on page 2-13. If your network supports multicast, the parallel computing processes accommodate multicast. However, because multicast is disabled on many networks for security reasons, you might require unicast communication between parallel computing processes. Most examples of parallel computing scripts and functions in this documentation show unicast usage. If your organization is a member of the Internet Multicast Backbone (MBone), make sure that your parallel computing cluster is isolated from MBone access if you are using multicast for parallel computing. This is generally the default condition. If you have any questions about MBone membership, contact your network administrator.
2-4
2-5
Network Administration
Build MPI
To use an MPI build that differs from the one provided with Parallel Computing Toolbox, this stage outlines the steps for creating an MPI build. If you already have an alternative MPI build, proceed to Use Your MPI Build on page 2-6.
1 Unpack the MPI sources into the target file system on your machine. For
example, suppose you have downloaded mpich2-distro.tgz and want to unpack it into /opt for building:
# # # # cd /opt mkdir mpich2 && cd mpich2 tar zxvf path/to/mpich2-distro.tgz cd mpich2-1.0.8
2 Build your MPI using the enable-sharedlibs option (this is vital, as you
must build a shared library MPI, binary compatible with MPICH2-1.0.8 for R2009b and later). For example, the following commands build an MPI with the nemesis channel device and the gforker launcher.
# ./configure -prefix=/opt/mpich2/mpich2-1.0.8 \ --enable-sharedlibs=gcc \ --with-device=ch3:nemesis \ --with-pm=gforker 2>&1 | tee log # make 2>&1 | tee -a log # make install 2>&1 | tee -a log
2-6
1 Test your build by running the mpiexec executable. The build should be
ready to test if its bin/mpiexec and lib/libmpich.so are available in the MPI installation location. Following the example in Build MPI on page 2-6, /opt/mpich2/mpich2-1.0.8/bin/mpiexec and /opt/mpich2/mpich2-1.0.8/lib/libmpich.so are ready to use, so you can test the build with:
$ /opt/mpich2/mpich2-1.0.8/bin/mpiexec -n 4 hostname
2 Create an mpiLibConf function to direct Parallel Computing Toolbox to
use your new MPI. Write your mpiLibConf.m to return the appropriate information for your build. For example:
function [primary, extras] = mpiLibConf primary = '/opt/mpich2/mpich2-1.0.8/lib/libmpich.so'; extras = {};
The primary path must be valid on the cluster; and your mpiLibConf.m file must be higher on the cluster workers path than matlabroot/toolbox/distcomp/mpi. (Sending mpiLibConf.m as a file dependency for this purpose does not work. You can get the mpiLibConf.m function on the worker path by either moving the file into a folder on the path, or by having the scheduler use cd in its command so that it starts the MATLAB worker from within the folder that contains the function.)
3 Determine necessary daemons and command-line options.
Determine all necessary daemons (often something like mpdboot or smpd). The gforker build example in this section uses an MPI that needs no services or daemons running on the cluster, but it can use only the local machine. Determine the correct command-line options to pass to mpiexec.
4 Use one of the following options to set up your scheduler to use your new
MPI build: For the simplest case of the mpiexec scheduler, set up a configuration to use the mpiexec executable from your new MPI build. It is crucial that you use matching mpiexec, MPI library, and any daemons (if
2-7
Network Administration
any), together. Set the configurations MpiexecFileName property to /opt/mpich2/mpich2-1.0.8/bin/mpiexec. If you are using a third-party scheduler (either fully supported or via the generic interface), modify your parallel wrapper script to pick up the correct mpiexec. Additionally, there may be a stage in the wrapper script where the MPI daemons are launched. The parallel submission wrapper script must: Determine which nodes are allocated by the scheduler. Start required daemon processes. For example, for the MPD process manager this means calling "mpdboot -f <nodefile>". Define which mpiexec executable to use for starting workers. Stop the daemon processes. For example, for the MPD process manager this means calling "mpdallexit". For examples of parallel wrapper scripts, see the subfolders of matlabroot/toolbox/distcomp/examples/integration/; specifically for an example for Sun Grid Engine, look in the subfolder sge/shared for parallelJobWrapper.sh. Wrapper scripts are available for various schedulers and file sharing configurations. Adopt and modify the appropriate script for your particular cluster usage.
2-8
cd matlabroot/toolbox/distcomp/bin
If you have more than one job manager running, stop each of them individually by host and name. For a list of all options to the script, type
stopjobmanager -help
2 For each MATLAB worker you want to shut down, enter the commands
2-9
Network Administration
If you have more than one worker session running, you can stop each of them individually by host and name.
stopworker -name worker1 -remotehost <worker hostname> stopworker -name worker2 -remotehost <worker hostname>
/etc/init.d/mdce stop
2 Remove the installed link to prevent the daemon from starting up again
at system reboot:
cd /etc/init.d/ rm mdce
Stop the Daemon Manually. If you used the alternative manual startup of the mdce daemon, use the following commands to stop it manually:
cd matlabroot/toolbox/distcomp/bin mdce stop
2-10
cd matlabroot\toolbox\distcomp\bin
If you have more than one job manager running, stop each of them individually by host and name. For a list of all options to the script, type
stopjobmanager -help
2 For each MATLAB worker you want to shut down, enter the commands
cd matlabroot\toolbox\distcomp\bin stopworker -remotehost <worker hostname> -name <worker name> -v
If you have more than one worker session running, you can stop each of them individually by host and name.
stopworker -remotehost <worker hostname> -name <worker1 name> stopworker -remotehost <worker hostname> -name <worker2 name>
2-11
Network Administration
service while leaving the machine on, enter the following commands at a DOS command prompt:
cd matlabroot\toolbox\distcomp\bin mdce stop
If you plan to uninstall the MATLAB Distributed Computing Server product from a machine, you might want to uninstall the mdce service also, because you no longer need it. You do not need to stop the service before uninstalling it. To uninstall the mdce service, enter the following commands at a DOS command prompt:
cd matlabroot\toolbox\distcomp\bin mdce uninstall
2-12
2-13
Network Administration
Note If you want to run more than one job manager on the same machine, they must all have unique names. Specify the names using flags with the startup commands.
Description Set this parameter to run the mdce services as a user different from the user who starts the service. On a UNIX operating system, set the value before starting the service; on a Windows operating system, set it before installing the service. On a Windows operating system, set this parameter to specify the password for the user identified in the MDCEUSER parameter; otherwise, the system prompts you for the password when the service is installed.
MDCEPASS
On UNIX operating systems, MDCEUSER requires that the current machine has the sudo utility installed, and that the current user be allowed to use sudo to execute commands as the user identified by MDCEUSER. For further information, refer to your system documentation on the sudo and sudoers utilities (for example, man sudo and man sudoers). The MDCEUSER is granted these permissions on Windows systems:
2-14
Purpose Required to log on using the service logon type. Required to start a process under a different user account. Required to start a process under a different user account.
Local Security Settings Policy Log on as a service Replace a process level token Adjust memory quotas for a process
The table above indicates which policies are affected by MDCEUSER. Double-click any of the listed policies in the Local Security Settings GUI to alter its setting or remove a user from that policy.
2-15
Network Administration
Alternatively, you can make a copy of this file, modify the copy, and specify that this copy be used for the default parameters. On UNIX or Macintosh operating systems, enter the command
mdce start -mdcedef my_mdce_def.sh
If you specify a new mdce_def file instead of the default file for the service on one computer, the new file is not automatically used by the mdce service on other computers. If you want to use the same alternative file for all your mdce services, you must specify it for each mdce service you install or start. For more information, see Define Script Defaults on page 2-13. Note The startup script flags take precedence over the settings in the mdce_def file.
2-16
You can set alternative locations for the log files by modifying the LOGBASE setting in the mdce_def.bat file before starting the mdce service. UNIX and Macintosh The default location of the log files is
/var/log/mdce/.
You can set alternative locations for the log files by modifying the LOGBASE setting in the mdce_def.sh file before starting the mdce service.
2-17
Network Administration
Platform Windows
File Location The default location of the checkpoint folders is <TEMP>\MDCE\Checkpoint, where <TEMP> is the value of the system TEMP variable. For example, if TEMP is set to C:\TEMP, the checkpoint folders are placed in C:\TEMP\MDCE\Checkpoint. You can set alternative locations for the checkpoint folders by modifying the CHECKPOINTBASE setting in the mdce_def.bat file before starting the mdce service.
The checkpoint folders are placed by default in /var/lib/mdce/. You can set alternative locations for the checkpoint folder by modifying the CHECKPOINTBASE setting in the mdce_def.sh file before starting the mdce service.
2-18
2-19
Network Administration
Security Level
Description
User Requirements
Tasks run as the user who started the mdce process on the worker machines (typically root or Local System). 2 Job manager (MJS) password protection on jobs. Jobs and tasks are identified with the submitting user, and are password protected. Other users cannot access your jobs. Tasks run as the user who started the mdce process on the worker machines (typically root or Local System).
When you start the job manager (MJS), it prompts you to provide a new password for that job managers admin account, which can be used for accessing all users jobs and tasks. A dialog box requires you to establish a user name and password when you first access the job manager (MJS) from the MATLAB client. Your job manager (MJS) user name and password do not have to match your system/network user name and password. On UNIX systems, the mdce process on the cluster nodes must be started by the root user. The job manager (MJS) must use secure communication with the workers (set in the mdce_def file). When you start the job manager (MJS), it prompts you to provide a new password for that job managers admin account, which can be used for accessing all users jobs and tasks. A dialog box requires you to establish a user name and password when you first access the job manager (MJS) from the MATLAB client. Your job manager (MJS) user name and password must be the same as
In addition to the security of level 2, tasks run as the submitting user on worker machines. Jobs and tasks are identified with the submitting user, and are password protected. Other users cannot access your jobs. Tasks run as the user who submitted the job.
2-20
Security Level
Description
User Requirements
your system/network user name and password, because the worker must log you in to run the task as you. All users that tasks run as, require read and write permissions to the CHECKPOINTBASE folder and all its subfolders. The job manager and the workers should run at the same security level. A worker running at too low a security level will fail to register with the job manager, because the job manager does not trust it.
2-21
Network Administration
You must also provide a value for the SHARED_SECRET_FILE parameter in the mdce_def file, identifying where the file can be found from the job manager (MJS) perspective. To create this file, run either script: matlabroot/toolbox/distcomp/bin/createSharedSecret (UNIX) matlabroot\toolbox\distcomp\bin\createSharedSecret.bat (Windows) The secret file establishes trust between the processes on different machines. In a shared file system, all the nodes can point to the same secret file, and they can even all share the same mdce_def file. In a nonshared file system, create a secret file with the provided script, then copy the file to each node and make sure each nodes mdce_def file indicates where its particular secret file is located. Note Secure communication is required when using job manager (MJS) security level 3.
2-22
License Errors
When starting a MATLAB worker, a licensing problem might result in the message
License checkout failed. No such FEATURE exists. License Manager Error -5
There are many reasons why you might receive this error: This message usually indicates that you are trying to use a product for which you are not licensed. Look at your license.dat file located within your MATLAB installation to see if you are licensed to use this product. If you are licensed for this product, this error may be the result of having extra carriage returns or tabs in your license file. To avoid this, ensure that each line begins with either #, SERVER, DAEMON, or INCREMENT. After fixing your license.dat file, restart your license manager and MATLAB should work properly. This error may also be the result of an incorrect system date. If your system date is before the date that your license was made, you will get this error.
2-23
Network Administration
If you receive this error when starting a worker with MATLAB Distributed Computing Server software:
You may be calling the startworker command from an installation that does not have access to a worker license. For example, starting a worker from a client installation of the Parallel Computing Toolbox product causes the following error:
The mdce service on the host hostname returned the following error: Problem starting the MATLAB worker. The cause of this problem is: ============================================================== Most likely, the MATLAB worker failed to start due to a licensing problem, or MATLAB crashed during startup. Check the worker log file /tmp/mdce_user/node_node_worker_05-11-01_16-52-03_953.log for more detailed information. The mdce log file /tmp/mdce_user/mdce-service.log may also contain some additional information. ===============================================================
2-24
If you installed only the Parallel Computing Toolbox product, and you are attempting to run a worker on the same machine, you will receive this error because the MATLAB Distributed Computing Server product is not installed, and therefore the worker cannot obtain a license.
Required Ports
With Job Manager
BASE_PORT. The mdce_def file specifies and describes the ports required by the job manager and all workers. See the following file in the MATLAB installation used for each cluster process: matlabroot/toolbox/distcomp/bin/mdce_def.sh (on UNIX operating systems) matlabroot\toolbox\distcomp\bin\mdce_def.bat (on Windows operating systems) Parallel Jobs. On worker machines running a UNIX operating system, the number of ports required by MPICH for the running of parallel jobs ranges from BASE_PORT + 1000 to BASE_PORT + 2000.
2-25
Network Administration
Client Ports
With the pctconfig function, you specify the ports used by the client. If the default ports cannot be used, this function allows you to configure ports separately for communication with the job manager and communication with pmode or a MATLAB pool.
2-26
3 On the Registry Editor window, select Edit > New > DWORD Value. 4 In the list of entries on the right, change the new value name to
Select Decimal for the Base value. Click OK. This parameter controls the maximum port number that is used when a program requests any available user port from the system. Typically, ephemeral (short-lived) ports are allocated between the values of 1024 and 5000 inclusive. This action allows allocation for port numbers up to 65534.
7 Quit the Registry Editor. 8 Reboot your machine.
2-27
Network Administration
The results should be the same, showing the same listing of job managers and workers. If the output indicates problems, run the command again with a higher information level to receive more detailed information:
nodestatus -remotehose hostB -infolevel 3
2-28
main method and its constructor take two input arguments: the multicast group to join and the port number to use. This Java class has a number of simple methods to attempt to join a specified multicast group. Once the class has successfully joined the group, it has methods to send messages to the group, listen for messages from the group, and display what it receives. You can use this class both from a command-line call to Java software and inside MATLAB. From a shell prompt (assuming that java is on your path), type
java -cp distcomp.jar com.mathworks.toolbox.distcomp.test.MulticastTester
2-29
Network Administration
The following example shows how to use the Java class inside MATLAB. Start MATLAB on two machines (e.g., host1name and host2name) for which you want to test multicast. In each MATLAB session, enter the following commands:
m = com.mathworks.toolbox.distcomp.test.MulticastTester('239.1.1.1', 9999); m.startSendingThread; m.startListeningThread;
These instructions cause each MATLAB session to issue a stream of multicast test packets, and to listen for test packets. If multicast is working between the machines, you see a stream of lines like the following:
0 1 2 3 : : : : host1name host2name host2name host2name : : : : 0 0 1 2
The number on the left in each string is the line number for the received packet. The text in the center is the host from which the packet is received. The number on the right is the packet number sent by the sending host. It is normal for a host to report a test packet from itself. If either machine does not receive a stream of test packets, or if the remote host is not included in either stream, then multicast communication is not operating properly. To terminate the test stream, execute the following in both MATLAB sessions:
m.stopSendingThread; m.stopListeningThread;
2-30
3
Product Installation
Install Products and Choose Cluster Configuration on page 3-2 Configure for an MJS on page 3-5 Configure for HPC Server on page 3-29 Configure for Supported Third-Party Schedulers (PBS Pro, Platform LSF, TORQUE) on page 3-35 Configure for a Generic Scheduler on page 3-42
Product Installation
Cluster Description
To set up a cluster, you first install MATLAB Distributed Computing Server (MDCS) on a node called the head node. You can also install the license manager on the head node. After performing this installation, you can then optionally install MDCS on the individual cluster nodes, called worker nodes. You do not need to install the license manager on worker nodes. This figure shows the installations that you perform on your MDCS cluster nodes. This is only one possible configuration. (You can install the cluster license manager and MDCS on separate nodes, but this document does not cover this type of installation.)
MDCS Cluster Head Node MDCS License Manager
You install Parallel Computing Toolbox (PCT) software on the computer that you use to write MATLAB applications. This is called the client node. This figure shows the installations that you must perform on client nodes.
3-2
MDCS Cluster
Install Products
On the Cluster Nodes
Install the MathWorks products on your cluster as a network installation according to the instructions found at
http://www.mathworks.com/help/base/install/
These instructions include steps for installing, licensing, and activating your installation. You can install in a central location, or individually on each cluster node. Note MathWorks highly recommends installing all MathWorks products on the cluster. MDCS cannot run jobs whose code requires products that are not installed.
3-3
Product Installation
These instructions include steps for installing, licensing, and activating your installation.
3-4
Note The MATLAB job scheduler (MJS) was formerly known as the MathWorks job manager. The process is the same, is started in the same way, and performs the same functions. In the following instructions, matlabroot refers to the location of your installed MATLAB Distributed Computing Server software. Where you see
3-5
Product Installation
this term used in the instructions that follow, substitute the path to your location.
This command adds MATLAB as an allowed program. If you are using other firewalls, you must configure them for similar accommodation. Configure User Access to Installation. The user that mdce runs as requires access to the cluster MATLAB installation location. By default, mdce runs as the user LocalSystem. If your network allows LocalSystem to access the install location, you can proceed to the next step. (If you are not sure of your network configuration and the access provided for LocalSystem, contact the MathWorks install support team.) Note If LocalSystem cannot access the install location, you must run mdce as a different user. You can set a different user with these steps:
3-6
1 With any standard text editor (such as WordPad) open the mdce_def file
found at:
matlabroot\toolbox\distcomp\bin\mdce_def.bat
2 Find the line for setting the MDCEUSER parameter, and provide a value in
set MDCEPASS=password
4 Save the file. Proceed to the next step.
command window with administrator privileges. Click the Windows menu Start > (All) Programs > Accessories; then right-click Command Window, and select Run as Administrator. This option is available only if you are running User Account Control (UAC).
b If you are using Windows XP, open a DOS command window by selecting
the Windows menu Start > Run, then in the Open field, type
cmd
3-7
Product Installation
2 In the command window, navigate to the folder of the old installation that
Note Using the -clean flag permanently removes all existing job data. Be sure this data is no longer needed before removing it.
4 Repeat the instructions of this step on all worker nodes.
parameters in the matlabroot/toolbox/distcomp/bin/mdce_def.sh file to point to a folder for which you have write privileges: CHECKPOINTBASE, LOGBASE, PIDBASE, and LOCKBASE if applicable.)
2 On each cluster node, stop the mdce service and remove its associated
Note Using the -clean flag permanently removes all existing job data. Be sure this data is no longer needed before removing it.
3-8
Identify Hosts and Start the mdce Service on page 3-9 Start the MJS on page 3-15 Start the Workers on page 3-15 Using the Command-Line Interface (Windows) on page 3-17 Start the mdce Service on page 3-17 Start the MJS on page 3-18 Start the Workers on page 3-19 Using the Command-Line Interface (UNIX) on page 3-20 Start the mdce Service on page 3-20 Start the MJS on page 3-21 Start the Workers on page 3-21 Using Admin Center GUI. Note To use Admin Center, you must run it on a computer that has direct network connectivity to all the nodes of your cluster. If you cannot run Admin Center on such a computer, follow the instructions in Using the Command-Line Interface (Windows) on page 3-17 or Using the Command-Line Interface (UNIX) on page 3-20.
3-9
Product Installation
Note To start the mdce service on remote machines from Admin Center, requires that you run Admin Center as a user who has administrator privileges on all the machines. If there are no past sessions of Admin Center saved for you, the GUI opens with a blank listing, superimposed by a welcome dialog box, which provides information on how to get started.
3-10
The following figure shows an example using host names node1, node2, node3, and node4. In your case, use your own host names.
the steps clicking Next and checking the settings at each step. For most settings, the default is appropriate.
3-11
Product Installation
It might take a moment for Admin Center to communicate with all the nodes, start the services, and acquire the status of all of them. When Admin Center completes the update, the listing should look something like the following figure.
3-12
5 At this point, you should test the connectivity between the nodes. This
assures that your cluster can perform the necessary communications for running other MCDS processes. In the Hosts module, click Test Connectivity.
6 When the Connectivity Testing dialog box opens, it shows the results of the
last test, if there are any. Click Run to run the tests and generate new data.
3-13
Product Installation
If any of the connectivity tests fail, double-click the icon that indicates a failure to get information about that specific test; or use the Log tab to get all test results. With this information, you can refer to the troubleshooting section of the MATLAB Distributed Computing Server System Administrators Guide. If you need further help, contact the MathWorks install support team. .
7 If your tests pass, click Close to return to the Admin Center GUI.
3-14
of several ways to open the New MJS dialog box.) In the New MJS dialog box, specify a name and host for your MJS. This example shows an MJS called MyMJS to run on host node1.
2 Click OK to start the MJS and return to the Admin Center GUI.
on each host. The number is up to you, but you cannot exceed the total number of licenses you have. A good starting value might be to start one worker per computational core on your hosts.
3 Select the hosts to start the workers on. Click Select All if you want to
Admin Center session, that is the default. The following example shows a setup for starting eight workers on four hosts (two workers each). Your names and numbers will vary.
3-15
Product Installation
5 Click OK to start the workers and return to the Admin Center dialog box.
It might take a moment for Admin Center to initialize all the workers and acquire their status. When all the workers are started, Admin Center looks something like the following figure. If your workers are all idle and connected, your cluster is ready for use.
3-16
If you encounter any problems or failures, contact the MathWorks install support team. For more information about Admin Center functionality, such as stopping processes or saving sessions, see the Admin Center chapter in the MATLAB Distributed Computing Server System Administrators Guide. Using the Command-Line Interface (Windows). Start the mdce Service. You must install the mdce service on all nodes (head node and worker nodes). Begin on the head node.
1 Open a DOS command window with the necessary privileges: a If you are using Windows 7 or Windows Vista, you must run the
command window with administrator privileges. Click the Windows menu Start > (All) Programs > Accessories; then right-click Command Window, and select Run as Administrator. This option is available only if you are running User Account Control (UAC).
3-17
Product Installation
b If you are using Windows XP, open a DOS command window by selecting
the Windows menu Start > Run, then in the Open field, type:
cmd
2 In the DOS command window, navigate to the folder with the control
scripts:
cd matlabroot\toolbox\distcomp\bin
3 Install the mdce service by typing the command:
mdce install
4 Start the mdce service by typing the command:
mdce start
5 Repeat the instructions of this step on all worker nodes.
As an alternative to items 35, you can install and start the mdce service on several nodes remotely from one machine by typing:
cd matlabroot\toolbox\distcomp\bin remotemdce install -remotehost hostA,hostB,hostC . . . remotemdce start -remotehost hostA,hostB,hostC . . .
where hostA,hostB,hostC refers to a list of your host names. Note that there are no spaces between host names, only a comma. If you need to indicate protocol, platform (such as in a mixed environment), or other information, see the help for remotemdce by typing:
remotemdce -help
Once installed, the mdce service starts running each time the machine reboots. The mdce service continues to run until explicitly stopped or uninstalled, regardless of whether an MJS or worker session is running. Start the MJS. To start the MATLAB job scheduler (MJS), enter the following commands in a DOS command window. You do not have to be at the machine on which the MJS runs, as long as you have access to the MDCS installation.
3-18
1 In your DOS command window, navigate to the folder with the startup
scripts:
cd matlabroot\toolbox\distcomp\bin
2 Start the MJS, using any unique text you want for the name <MyMJS>:
Note If you are executing startjobmanager on the host where the MJS runs, you do not need to specify the -remotehost flag. If you have more than one MJS on your cluster, each must have a unique name.
Start the Workers. Note Before you can start a worker on a machine, the mdce service must already be running on that machine, and the license manager for MATLAB Distributed Computing Server must be running on the network. For each node used as a worker, enter the following commands in a DOS command window. You do not have to be at the machines where the MATLAB workers will run, as long as you have access to the MDCS installation.
1 Navigate to the folder with the startup scripts:
cd matlabroot\toolbox\distcomp\bin
2 Start the workers on each node, using the text for <MyMJS> that identifies
the name of the MJS you want this worker registered with. Enter this text on a single line:
3-19
Product Installation
startworker -jobmanagerhost <MJS host name> -jobmanager <MyMJS> -remotehost <worker host name> -v
To run more than one worker session on the same node, give each worker a unique name by including the -name option on the startworker command, and run it for each worker on that node:
startworker ... -name <worker1 name> startworker ... -name <worker2 name>
3 Verify that the workers are running.
For more information about mdce, MJS, and worker processes, such as how to shut them down or customize them, see the Network Administration chapter in the MATLAB Distributed Computing Server System Administrators Guide. Using the Command-Line Interface (UNIX). Start the mdce Service. On each cluster node, start the mdce service by typing the commands:
cd matlabroot/toolbox/distcomp/bin ./mdce start
Alternatively (on Linux, but not Macintosh), you can start the mdce service on several nodes remotely from one machine by typing
cd matlabroot/toolbox/distcomp/bin ./remotemdce start -remotehost hostA,hostB,hostC . . .
where hostA,hostB,hostC refers to a list of your host names. Note that there are no spaces between host names, only a comma. If you need to indicate protocol, platform (such as in a mixed environment), or other information, see the help for remotemdce by typing
./remotemdce -help
3-20
Start the MJS. To start the MATLAB job scheduler (MJS), enter the following commands. You do not have to be at the machine on which the MJS runs, as long as you have access to the MDCS installation.
1 Navigate to the folder with the startup scripts:
cd matlabroot/toolbox/distcomp/bin
2 Start the MJS, using any unique text you want for the name <MyMJS>.
Note If you have more than one MJS on your cluster, each must have a unique name.
Start the Workers. Note Before you can start a worker on a machine, the mdce service must already be running on that machine, and the license manager for MATLAB Distributed Computing Server must be running on the network. For each computer hosting a MATLAB worker, enter the following commands. You do not have to be at the machines where the MATLAB workers run, as long as you have access to the MDCS installation.
1 Navigate to the folder with the startup scripts:
cd matlabroot/toolbox/distcomp/bin
2 Start the workers on each node, using the text for <MyMJS> that identifies
the name of the MJS you want this worker registered with. Enter this text on a single line:
3-21
Product Installation
./startworker -jobmanagerhost <MJS host name> -jobmanager <MyMJS> -remotehost <worker host name> -v
To run more than one worker session on the same machine, give each worker a unique name with the -name option:
./startworker ... -name <worker1> ./startworker ... -name <worker2>
3 Verify that the workers are running. Repeat this command for each worker
node:
./nodestatus -remotehost <worker host name>
For more information about mdce, MJS, and worker processes, such as how to shut them down or customize them, see the Network Administration chapter in the MATLAB Distributed Computing Server System Administrators Guide.
Step 4: Install the mdce Service to Start Automatically at Boot Time (UNIX)
Although this step is not required, it is helpful in case of a system crash. Once configured for this, the mdce service starts running each time the machine reboots. The mdce service continues to run until explicitly stopped, regardless of whether an MJS or worker session is running. You must have root privileges to do this step. Debian, Fedora Platforms. On each cluster node, register the mdce service as a known service and configure it to start automatically at system boot time by following these steps:
1 Create the following link, if it does not already exist:
ln -s matlabroot/toolbox/distcomp/bin/mdce /etc/mdce
2 Create the following link to the boot script file:
ln -s matlabroot/toolbox/distcomp/bin/mdce /etc/init.d/mdce
3 Set the boot script file permissions:
3-22
folder associated with that run level. For example, if the run level is 5, execute these commands:
cd /etc/rc5.d; ln -s ../init.d/mdce S99MDCE
SUSE Platform. On each cluster node, register the mdce service as a known service and configure it to start automatically at system boot time by following these steps:
1 Create the following link, if it does not already exist:
ln -s matlabroot/toolbox/distcomp/bin/mdce /etc/mdce
2 Create the following link to the boot script file:
ln -s matlabroot/toolbox/distcomp/bin/mdce /etc/init.d/mdce
3 Set the boot script file permissions:
folder associated with that run level. For example, if the run level is 5, execute these commands:
cd /etc/init.d/rc5.d; ln -s ../mdce S99MDCE
Red Hat Platform (non-Fedora). On each cluster node, register the mdce service as a known service and configure it to start automatically at system boot time by following these steps:
1 Create the following link, if it does not already exist:
ln -s matlabroot/toolbox/distcomp/bin/mdce /etc/mdce
2 Create the following link to the boot script file:
3-23
Product Installation
ln -s matlabroot/toolbox/distcomp/bin/mdce /etc/init.d/mdce
3 Set boot script file permissions:
folder associated with that run level. For example, if the run level is 5, execute these commands:
cd /etc/rc.d/rc5.d; ln -s ../../init.d/mdce S99MDCE
Macintosh Platform. On each cluster node, register the mdce service as a known service with launchd, and configure it to start automatically at system boot time by following these steps:
1 Navigate to the toolbox folder and stop the running mdce service:
3-24
1 Log in as a user with administrative privileges. 2 Execute the following in a DOS command window.
matlabroot\toolbox\distcomp\bin\addMatlabToWindowsFirewall.bat
This command adds MATLAB as an allowed program. If you are using other firewalls, you must configure them for similar accommodation.
open a DOS command window (for Windows software) or a shell (for UNIX software) and go to the control script folder.
cd matlabroot\toolbox\distcomp\bin (for Windows) cd matlabroot/toolbox/distcomp/bin (for UNIX)
2 Run nodestatus to verify your cluster communications. Substitute <MJS
Host> with the host name of your MJS computer. nodestatus -remotehost <MJS Host>
If successful, you should see the status of your MJS (job manager) and its workers. Otherwise, refer to the troubleshooting section of the MATLAB Distributed Computing Server System Administrators Guide.
3-25
Product Installation
on the Home tab in the Environment area Parallel > Manage Cluster Profiles.
2 Create a new profile in the Cluster Profile Manager by selecting
Depending on your network, this might be only a host name, or it might have to be a fully qualified domain name.
c Set the MJSName field to the name of your MJS, which you started
earlier. So far, the dialog box should look like the following figure:
3-26
MATLAB desktop by selecting on the Home tab in the Environment area Parallel > Manage Cluster Profiles.
2 Select your cluster profile in the listing. 3 Click Validate.
The Validation Results tab shows the output. The following figure shows the results of a profile that passed all validation tests.
Note If your validation does not pass, contact the MathWorks install support team. If your validation passed, you now have a valid profile that you can use in other parallel applications. You can make any modifications to your profile appropriate for your applications, such as NumWorkersRange, AttachedFiles, AdditionalPaths, etc. To save your profile for other users, select the profile
3-27
Product Installation
and click Export, then save your profile to a file in a convenient location. Later, when running the Cluster Profile Manager, other users can import your profile by clicking Import.
3-28
This command performs some of the setup required for all machines in the cluster. The location of the MATLAB installation must be the same on every cluster node.
3-29
Product Installation
Note If you need to override the script default values, modify the values defined in MicrosoftHPCServerSetup.xml before running MicrosoftHPCServerSetup.bat. Use the -def_file argument to the script when using a MicrosoftHPCServerSetup.xml file in a custom location. For example:
MicrosoftHPCServerSetup.bat -cluster -def_file <filename>
You modify the file only on the node where you actually run the script. An example of one of the values you might set is for CLUSTER_NAME. If you provide a friendly name for the cluster in this parameter, it is recognized by MATLABs discover clusters feature and displayed in the resulting cluster list.
This command performs some of the setup required for a client machine.
3-30
Note If you need to override the default values the script, modify the values defined in MicrosoftHPCServerSetup.xml before running MicrosoftHPCServerSetup.bat. Use the -def_file argument to the script when using a MicrosoftHPCServerSetup.xml file in a custom location. For example:
MicrosoftHPCServerSetup.bat -client -def_file <filename>
HPC Server client utilities must be installed on your MATLAB client machine. If they are not already installed and up to date, ask your system administrator for the correct client utilities to install. The utilities are available from http://www.microsoft.com/hpc/en/us/default.aspx.
on the Home tab in the Environment area Parallel > Manage Cluster Profiles.
2 Create a new profile in the Cluster Profile Manager by selecting
Server.
3-31
Product Installation
b Set the NumWorkers field to the number of workers you want to run
running. Depending on your network, this might be a simple host name, or it might have to be a fully qualified domain name. Note: The following four property settings (JobStorageLocation, ClusterMatlabRoot, ClusterVersion, and UseSOAJobSubmission) are optional, and need to be set in here in the profile only if you did not run MicrosoftHPCServerSetup.bat as described in Configure Cluster for Microsoft Windows HPC Server on page 3-29, or if you want to override the setting established by that script.
d Set the JobStorageLocation to the location where you want job
and task data to be stored. This must be accessible to all the worker machines. Note JobStorageLocation should not be shared by parallel computing products running different versions; each version on your cluster should have its own JobStorageLocation.
e Set the ClusterMatlabRoot to the installation location of the MATLAB
Set the ClusterVersion field to HPCServer2008 or CCS. set UseSOAJobSubmission to true. Otherwise leave the setting Use default or false. If you plan on using SOA job submissions with your cluster, you should test this first without SOA submission, then later return and test it with SOA job submission. So far, the dialog box should look like the following figure:
g If you want to test SOA job submissions on an HPC Server 2008 cluster,
3-32
MATLAB desktop by selecting on the Home tab in the Environment area Parallel > Manage Cluster Profiles.
2 Select your cluster profile in the listing. 3 Click Validate.
The Validation Results tab shows the output. The following figure shows the results of a profile that passed all validation tests.
3-33
Product Installation
Note If your validation does not pass, contact the MathWorks install support team. If your validation passed, you now have a valid profile that you can use in other parallel applications. You can make any modifications to your profile appropriate for your applications, such as NumWorkersRange, AttachedFiles, AdditionalPaths, etc. To save your profile for other users, select the profile and click Export, then save your profile to a file in a convenient location. Later, when running the Cluster Profile Manager, other users can import your profile by clicking Import.
3-34
Configure for Supported Third-Party Schedulers (PBS Pro, Platform LSF, TORQUE)
Configure for Supported Third-Party Schedulers (PBS Pro, Platform LSF, TORQUE)
In this section... Configure Platform LSF Scheduler on Windows Cluster on page 3-35 Configure Windows Firewalls on Client on page 3-38 Validate Installation Using an LSF, PBS Pro, or TORQUE Scheduler on page 3-38
Note You must use the generic scheduler interface for any of the following: Any third-party schedule not listed above (e.g., Sun Grid Engine, GridMP, etc.) PBS other than PBS Pro A nonshared file system when the client cannot directly submit to the scheduler (e.g., TORQUE on Windows)
3-35
Product Installation
To use mpiexec to distribute a job, the smpd service must be running on all nodes that will be used for running MATLAB workers. Note The smpd executable does not support running from a mapped drive. Use either a local installation, or the full UNC pathname to the executable. Microsoft Windows Vista does not support the smpd executable on network share installations, so with Vista the installation must be local. Choose one of the following configurations: Without Delegation on page 3-36 Using Passwordless Delegation on page 3-37
Without Delegation
1 Log in as a user with administrator privileges. 2 Start smpd by typing in a DOS command window one of the following,
as appropriate:
matlabroot\bin\win32\smpd -install
or
matlabroot\bin\win64\smpd -install
This command installs the service and starts it. As long as the service remains installed, it will start each time the node boots.
3 If this is a worker machine and you did not run the installer on it to install
MDCS software (for example, if you are running MDCS software from a shared installation), execute the following command in a DOS command window.
matlabroot\bin\matlab.bat -install_vcrt
This command installs the Microsoft run-time libraries needed for running distributed and parallel jobs with the your scheduler.
3-36
Configure for Supported Third-Party Schedulers (PBS Pro, Platform LSF, TORQUE)
4 If you are using Windows firewalls on your cluster nodes, execute the
This command adds MATLAB as an allowed program. If you are using other firewalls, you must configure them to make similar accommodation.
5 Log in as the user who will be submitting jobs for execution on this node. 6 Register this user to use mpiexec by typing one of the following, as
appropriate:
matlabroot\bin\win32\mpiexec -register
or
matlabroot\bin\win64\mpiexec -register
7 Repeat steps 56 for all users who will run jobs on this machine. 8 Repeat all these steps on all Windows nodes in your cluster.
as appropriate:
matlabroot\bin\win32\smpd -register_spn
or
matlabroot\bin\win64\smpd -register_spn
This command installs the service and starts it. As long as the service remains installed, it will start each time the node boots.
3 If this is a worker machine and you did not run the installer on it to install
MDCS software (for example, if you are running MDCS software from a
3-37
Product Installation
This command installs the Microsoft run-time libraries needed for running distributed and parallel jobs with the your scheduler.
4 If you are using Windows firewalls on your cluster nodes, execute the
This command adds MATLAB as an allowed program. If you are using other firewalls, you must configure them for similar accommodation.
5 Repeat these steps on all Windows nodes in your cluster.
This command adds MATLAB as an allowed program. If you are using other firewalls, you must configure them for similar accommodation.
3-38
Configure for Supported Third-Party Schedulers (PBS Pro, Platform LSF, TORQUE)
1 Start the Cluster Profile Manager from the MATLAB desktop by selecting
on the Home tab in the Environment area Parallel > Manage Cluster Profiles.
2 Create a new profile in the Cluster Profile Manager by selecting
task data to be stored (accessible to all the worker machines if you have a shared file system). Note JobStorageLocation should not be shared by parallel computing products running different versions; each version on your cluster should have its own JobStorageLocation.
c Set the NumWorkers field to the number of workers you want to run
If using LSF, set the OperatingSystem to the operating system of your worker machines. the same data location. The dialog box should look something like this, or slightly different for PBS Pro or TORQUE schedulers.
3-39
Product Installation
MATLAB desktop by selecting on the Home tab in the Environment areaParallel > Manage Cluster Profiles.
2 Select your cluster profile in the listing. 3 Click Validate.
The Validation Results tab shows the output. The following figure shows the results of a profile that passed all validation tests.
3-40
Configure for Supported Third-Party Schedulers (PBS Pro, Platform LSF, TORQUE)
Note If your validation does not pass, contact the MathWorks install support team. If your validation passed, you now have a valid profile that you can use in other parallel applications. You can make any modifications to your profile appropriate for your applications, such as NumWorkersRange, AttachedFiles, AdditionalPaths, etc. To save your profile for other users, select the profile and click Export, then save your profile to a file in a convenient location. Later, when running the Cluster Profile Manager, other users can import your profile by clicking Import.
3-41
Product Installation
Note You must use the generic scheduler interface for any of the following: Any third-party schedule not listed in previous chapters (e.g., Sun Grid Engine, GridMP, etc.) PBS other than PBS Pro A nonshared file system when the client cannot directly submit to the scheduler (e.g., TORQUE on Windows)
This chapter includes the following sections. Read all that apply to your configuration: In this section... Interfacing with Generic Schedulers on page 3-43 Configure Generic Scheduler on Windows Cluster on page 3-44 Configure Sun Grid Engine on Linux Cluster on page 3-47 Configure Windows Firewalls on Client on page 3-48 Validate Installation Using a Generic Scheduler on page 3-48
3-42
Support Scripts
To support usage of the generic scheduler interface, templates and scripts are provided with the product in the folder:
matlabroot\toolbox\distcomp\examples\integration (on Windows) matlabroot/toolbox/distcomp/examples/integration (on UNIX)
Subfolders are provided for several different kinds of schedulers, and each of those contains a subfolder for the supported usage modes for shared file system, nonshared file system, or remote submission. Each folder contains a file named README that provides specific instructions on how to use the scripts. Further information on programming independent jobs for generic schedulers, see:
http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/bqur7ev-35.html
Submission Mode
The provided scripts support three possible submission modes: Shared When the client machine is able to submit directly to the cluster and there is a shared file system present between the client and the cluster machines. Remote Submission When there is a shared file system present between the client and the cluster machines, but the client machine is not able to submit directly to the cluster (for example, if the schedulers client utilities are not installed).
3-43
Product Installation
Nonshared When there is not a shared file system between client and cluster machines. Before using the support scripts, decide which submission mode describes your particular network setup.
3-44
Without Delegation
1 Log in as a user with administrator privileges. 2 Start smpd by typing in a DOS command window one of the following,
as appropriate:
matlabroot\bin\win32\smpd -install
or
matlabroot\bin\win64\smpd -install
This command installs the service and starts it. As long as the service remains installed, it will start each time the node boots.
3 If this is a worker machine and you did not run the installer on it to install
MDCS software (for example, if you are running MDCS software from a shared installation), execute the following command in a DOS command window.
matlabroot\bin\matlab.bat -install_vcrt
This command installs the Microsoft run-time libraries needed for running distributed and parallel jobs with the your scheduler.
4 If you are using Windows firewalls on your cluster nodes, execute the
This command adds MATLAB as an allowed program. If you are using other firewalls, you must configure them to make similar accommodation.
5 Log in as the user who will be submitting jobs for execution on this node. 6 Register this user to use mpiexec by typing one of the following, as
appropriate:
matlabroot\bin\win32\mpiexec -register
or
3-45
Product Installation
matlabroot\bin\win64\mpiexec -register
7 Repeat steps 56 for all users who will run jobs on this machine. 8 Repeat all these steps on all Windows nodes in your cluster.
as appropriate:
matlabroot\bin\win32\smpd -register_spn
or
matlabroot\bin\win64\smpd -register_spn
This command installs the service and starts it. As long as the service remains installed, it will start each time the node boots.
3 If this is a worker machine and you did not run the installer on it to install
MDCS software (for example, if you are running MDCS software from a shared installation), execute the following command in a DOS command window.
matlabroot\bin\matlab.bat -install_vcrt
This command installs the Microsoft run-time libraries needed for running distributed and parallel jobs with the your scheduler.
4 If you are using Windows firewalls on your cluster nodes, execute the
This command adds MATLAB as an allowed program. If you are using other firewalls, you must configure them for similar accommodation.
5 Repeat these steps on all Windows nodes in your cluster.
3-46
of slots and the correct location of the startmatlabpe.sh and stopmatlabpe.sh files. (These files can exist in a shared location accessible by all hosts, or they can be copied to the same local on each host.) You can also change other values or add additional values to matlabpe.template to suit your cluster. For more information, refer to the sge_pe documentation provided with your scheduler.
3 Add the matlab parallel environment, using a shell command like:
3-47
Product Installation
This will bring up a text editor for you to make changes: search for the line pe_list, and add matlab.
5 Ensure you can submit a trivial job to the PE:
file contains the name of the host that ran the job. The default filename for the output file is ~/STDIN.o###, where ### is the SGE job number. Note The example submit functions for SGE rely on the presence of the matlab parallel environment. If you change the name of the parallel environment to something other than matlab, you must ensure that you also change the submit functions.
This command adds MATLAB as an allowed program. If you are using other firewalls, you must configure them to for accommodation.
3-48
Note The remainder of this chapter illustrates only the case of using LSF in a nonshared file system. For other schedulers or a shared file system, look for the appropriate scripts and modify them as necessary, using the following instructions as a guide. If you have any questions, contact the MathWorks install support team.
These scripts are written for an LSF scheduler, but might require modification to work in your network. The following diagram illustrates the cluster setup:
Users Desktop
Cluster
MATLAB worker
MATLAB client
Scheduler
r/w
Copy (sFTP)
r/w
3-49
Product Installation
In this type of configuration, job data is copied from the client host running a Windows operating system to a host on the cluster (cluster login node) running a UNIX operating system. From the cluster login node, the LSF bsub command submits the job to the scheduler. When the job finishes, its output is copied back to the client host. Requirements. For this setup to work, the following conditions must be met: The client node and cluster login node must support ssh and sFTP. The cluster login node must be able to call the bsub command to submit a job to an LSF scheduler. You can find more about this in the file:
matlabroot\toolbox\distcomp\examples\integration\lsf\nonshared\README
If these requirements are met, use the following steps to implement the solution:
do this by copying them to a folder already on the path. Browse to the folder:
matlabroot\toolbox\distcomp\examples\integration\lsf\nonshared
Copy all the files from that folder, and paste them into the folder:
matlabroot\toolbox\local
3-50
2 Start the Cluster Profile Manager from the MATLAB desktop by selecting
task data to be stored on the client machine (not the cluster location). Note JobStorageLocation should not be shared by parallel computing products running different versions; each version on your cluster should have its own JobStorageLocation.
c Set the NumWorkers to the number of workers you want to test your
installation on.
d Set the ClusterMatlabRoot to the installation location of the MATLAB
where
cluster-host-name is the name of the cluster host from which the job will be submitted to the scheduler; and, /network/share/joblocation is the location on the cluster where the scheduler can access job data. This must be accessible from all cluster nodes.
f
3-51
Product Installation
worker machines.
h Set HasSharedFilesystem to false, indicating that the client node
3-52
MATLAB desktop by selecting on the Home tab in the Environment area Parallel > Manage Cluster Profiles.
2 Select your cluster profile in the listing. 3 Click Validate.
The Validation Results tab shows the output. The following figure shows the results of a profile that passed all validation tests.
3-53
Product Installation
Note If your validation fails any stage, contact the MathWorks install support team. If your validation passed, you now have a valid profile that you can use in other parallel applications. You can make any modifications to your profile appropriate for your applications, such as NumWorkersRange, AttachedFiles, AdditionalPaths, etc. To save your profile for other users, select the profile and click Export, then save your profile to a file in a convenient location. Later, when running the Cluster Profile Manager, other users can import your profile by clicking Import..
3-54
4
Admin Center
Start Admin Center on page 4-2 Set Up Resources on page 4-3 Test Connectivity on page 4-11 Export and Import Sessions on page 4-14 Prepare for Cluster Profiles on page 4-15
Admin Center
A new session of Admin Center has no cluster hosts listed, so the usual first step is to identify the hosts you want to include in your listing. To do this, click Add or Find. Further information continues in the next section, Set Up Resources on page 4-3. If you start Admin Center again on the same host, your previous session for that machine is loaded; and unless the update rate is set to never, Admin Center performs an update immediately for the listed hosts and processes. To clear this information and start a new session, select the pull-down File > New Session.
4-2
Set Up Resources
Set Up Resources
In this section... Add Hosts on page 4-3 Start mdce Service on page 4-4 Start an MJS on page 4-5 Start Workers on page 4-7 Stop, Destroy, Resume, Restart Processes on page 4-9 Move a Worker on page 4-10 Update the Display on page 4-10
Add Hosts
To specify the hosts you want listed in Admin Center, click Add or Find in the Welcome dialog box, or if this is not a new session, click Add or Find in the Hosts module. In the Add or Find Hosts dialog box, identify the hosts you want to add to the listing, by one of the following methods: Select Enter Hostnames and provide short host names, fully qualified domain names, or individual IP addresses for the hosts. Select Enter IP Range and provide the range of IP addresses for your hosts. If one of the hosts you have specified is running a MATLAB job scheduler (MJS), Admin Center automatically finds and lists all the hosts running workers registered with that MJS. Similarly, if you specify a host that is running a worker, Admin Center finds and lists the host running that workers MJS, and then also all hosts running other workers under that MJS.
4-3
Admin Center
4-4
Set Up Resources
A dialog box leads you through the procedure of starting the mdce service on the selected hosts. There are five steps to the procedure in which you provide or confirm information for the service:
1 Specify remote platform Windows or UNIX. You can start mdce on
multiple hosts at the same time, but they all must be the same platform. If you have a mixed platform cluster, run the mdce startup separately for each type of platform.
2 Specify remote communication Choose the protocol for communication
The dialog box looks like this for the first step:
At each step, you can click Help to read detailed information about that step.
Start an MJS
To start an MJS, click Start in the MJS module.
4-5
Admin Center
In the New MATLAB Job Scheduler dialog box, provide a name for the MJS, and select a host to run it on.
Alternative methods for starting an MJS include selecting the pull-down MJS > Start, or right-clicking a listed host and selecting, Start MJS. With an MJS running on your cluster, Admin Center might look like the following figure, with the MJS listed in the MJS module, as well as being listed by name in the Hosts module in the line for the host on which it is running.
4-6
Set Up Resources
Start Workers
To start MATLAB workers, click Start in the Workers module. In the Start Workers dialog box, specify the numbers of workers to start on each host, and select the hosts to run them. From the list, select the MJS for these workers. Click OK to start the workers. Admin center automatically provides names for the workers, based on the hosts running them.
4-7
Admin Center
Alternative methods for starting workers include selecting the pull-down Workers > Start, or right-clicking a listed host or MJS and selecting Start Workers. With workers running on your cluster, Admin Center might look like the following figure, which shows the workers listed in the Workers module. Also, the number of workers running under the MJS is listed in the MJS module, and the number of workers for each MJS is listed in the Hosts module.
4-8
Set Up Resources
To get more information on any host, MJS, or worker listed in Admin Center, right-click its name in the display and select Properties. Alternatively, you can find the Properties option under the Hosts, MJS, and Workers drop-down menus.
4-9
Admin Center
Move a Worker
To move a worker from one host to another, you must completely shut it down, than start a new worker on the desired host:
1 Right-click the worker in the Workers module list. 2 Select Destroy. This shuts down the worker process and removes all its
data.
3 If the old worker host is not running any other MDCS processes (mdce
service, MJS, or workers), you might want to remove it from the Admin Center listing.
4 If necessary, add the new host to the Admin Center host listing. 5 In the Workers module, click Start. Select the desired host in the Start
Workers dialog box, along with the appropriate number and MJS name. Use a similar process to move an MJS from one host to another. Note, however, that all workers registered with the MJS must be destroyed and started again, registering them with the new instance of the MJS.
4-10
Test Connectivity
Test Connectivity
Admin Center lets you test communications between your MJS node, worker nodes, and the node where Admin Center is running. The tests are divided into four categories: Client Verifies that the node running Admin Center is properly configured so that further cluster testing can proceed. Client to Nodes Verifies that the node running Admin Center can identify and communicate with the other nodes in the cluster. Nodes to Nodes Verifies that the other nodes in the cluster can identify each other, and that each node allows its mdce service to communicate with the mdce service on the other cluster nodes. Nodes to Client Verifies that other cluster nodes can identify and communicate with the node running Admin Center. First click Test Connectivity to open the Connectivity Testing dialog box. By default, the dialog box displays the results of the last test. To run new tests and update the display, click Run. During test execution, Admin Center displays this progress dialog box.
4-11
Admin Center
When the tests are complete, the Running Tests dialog box automatically closes, and Admin Center displays the test results in the Connectivity Testing dialog box.
The possible test result symbols are described in the following table. Test Result Description Test passed. Test passed, extra information is available. Test passed, but generated a warning. Test failed. Test was skipped, possibly because prerequisite tests did not pass.
4-12
Test Connectivity
Test that include failures or other results might look like the following figure.
Double-click any of the symbols in the test results to drill down for more detail. Use the Log tab to see the raw data from the tests. The results of the tests that run on only the client are displayed in the lower-left corner of the dialog box. To drill into client-only test results, click More Info.
4-13
Admin Center
4-14
4-15
Admin Center
4-16
5
Control Script Reference
mdce Process Control (p. 5-2) Job Manager Control (p. 5-3) Worker Control (p. 5-4) Control mdce service Control job manager Control MATLAB workers
remotemdce
5-2
5-3
Worker Control
startworker stopworker Start MATLAB worker session Stop MATLAB worker session
5-4
6
Control Scripts Alphabetical List
admincenter
Center. When setting up or using a MATLAB job scheduler (MJS) cluster, Admin Center allows you to establish and verify your cluster, and to diagnose possible problems. For details about using Admin Center, see:. Start Admin Center on page 4-2 Set Up Resources on page 4-3 Test Connectivity on page 4-11
See Also
6-2
createSharedSecret
the given filename. Before passing sensitive data from one service to another (e.g., between job manager and workers), these services need to establish a trust relationship using a shared secret. This script creates a file that serves as a shared secret between the services. Each service is trusted that has access to that secret file. Create the secret file only once per cluster on one machine, then copy it into the location specified by SHARED_SECRET_FILE in the mdce_def file on each machine before starting any job managers or workers. In a shared file system, all nodes can point to the same file. Shared secrets can be reused in subsequent sessions.
Examples
Create a shared secret file in a central location for all the nodes of the cluster:
cd matlabInstallDir/toolbox/distcomp/bin createSharedSecret -file /share/secret
Then make sure that the nodes shared or copied mdce_def files set the parameter SHARED_SECRET_FILE to /share/secret before starting the mdce service on each.
See Also
mdce
6-3
mdce
Purpose Syntax
Description
The mdce service ensures that all other processes are running and that it is possible to communicate with them. Once the mdce service is running, you can use the nodestatus command to obtain information about the mdce service and all the processes it maintains. The mdce executable resides in the folder matlabroot\toolbox\distcomp\bin (Windows operating system) or matlabroot/toolbox/distcomp/bin (UNIX operating system). Enter the following commands at a DOS or UNIX command-line prompt, respectively.
mdce install installs the mdce service in the Microsoft Windows Service Control Manager. This causes the service to automatically start when the Windows operating system boots up. The service must be installed before it is started. mdce uninstall uninstalls the mdce service from the Windows Service Control Manager. Note that if you wish to install mdce service as a different user, you must first uninstall the service and then reinstall as the new user. mdce start starts the mdce service. This creates the required logging
and checkpointing directories, and then starts the service as specified in the mdce defaults file.
6-4
mdce
mdce stop stops running the mdce service. This automatically stops all job managers and workers on the computer, but leaves their checkpoint information intact so that they will start again when the mdce service is started again. mdce console starts the mdce service as a process in the current terminal or command window rather than as a service running in the background. mdce restart performs the equivalent of mdce stop followed by mdce start. This command is available only on UNIX and Macintosh
operating systems.
mdce ... -mdcedef <mdce_defaults_file> uses the specified alternative mdce defaults file instead of the one found in matlabroot/toolbox/distcomp/bin. mdce ... -clean performs a complete cleanup of all service checkpoint and log files before installing or starting the service, or after stopping or uninstalling it. This deletes all information about any job managers or workers this service has ever maintained.
mdce status reports the status of the mdce service, indicating whether it is running and with what PID. Use nodestatus to obtain more detailed information about the mdce service. The mdce status command is available only on UNIX and Macintosh operating systems. mdce -version prints version information of the mdce process to
See Also
6-5
nodestatus
which it maintains. The mdce service must already be running on the specified computer. The nodestatus executable resides in the folder matlabroot\toolbox\distcomp\bin (Windows operating system) or matlabroot/toolbox/distcomp/bin (UNIX operating system). Enter the following command syntax at a DOS or UNIX command-line prompt, respectively.
nodestatus -flags accepts the following input flags. Multiple flags
Operation Displays the status of the mdce service and the processes it maintains on the specified host. The default value is the local host. Specifies how much status information to report, using a level of 1-3. 1 means only the basic information, 3 means all information available. The default value is 1.
-infolevel <level>
6-6
nodestatus
Flag
-baseport <port_number>
Operation Specifies the base port that the mdce service on the remote host is using. You need to specify this only if the value of BASE_PORT in the local mdce_def file does not match the base port being used by the mdce service on the remote host. Verbose mode displays the progress of the command execution.
-v
Examples
Display basic information about the mdce processes on the local host.
nodestatus
Display detailed information about the status of the mdce processes on host node27.
nodestatus -remotehost node27 -infolevel 2
See Also
6-7
remotecopy
Copy file or folder to or from one or more remote hosts using transport protocol
remotecopy <flags> <protocol options> remotecopy copies a file or folder to or from one or more remote hosts by
using a transport protocol (such as rsh or ssh). Copying from multiple hosts creates a separate file per host, appending the hostname to the specified filename. The general form of the syntax is:
remotecopy <flags> <protocol options>
Operation Specify the name of the file or folder on the local host. Specify the name of the file or folder on the remote host. Specify to copy from the remote hosts to the local host. You must use either the -from flag, or the -to flag. Specify to copy to the remote hosts from the local host. You must use either the -from flag, or the -to flag. Specify the names of the hosts where you want to copy to or from. Separate the host names by commas without any white spaces. This is a mandatory argument. Specify the platform of the remote hosts. This option is required only if different from the local platform.
-to
-remotehost host1[,host2[,...]
6-8
remotecopy
Operation Prevent remotecopy from prompting for missing information. The command fails if all required information is not specified. Print the help information for this command. Force the usage of a particular protocol type. Specifying a protocol type with all its required parameters also avoids interactive prompting and allows for use in scripts. The supported protocol types are scp, sftp and rcp. To get more information about one particular protocol type, enter
remotecopy -protocol <type> -help
For example:
remotecopy -protocol sftp -help <protocol options>
Note The file permissions on the copy might not be the same as the permissions on the original file.
Examples
Copy the local file mdce_def.sh to two other machines. (Enter this command on a single line.)
remotecopy -local mdce_def.sh -to -remote /matlab/toolbox/distcomp/bin -remotehost hostA,hostB
6-9
remotecopy
Retrieve folders of the same name from two hosts to the local machine. (Enter command on a single line.)
remotecopy -local C:\temp\log -from -remote C:\temp\mdce\log -remotehost winHost1,winHost2
See Also
remotemdce
6-10
remotemdce
The following table describes the supported flags and options. They can be combined in the same command. Note that flags are each preceded by a dash (-). Flags and Options
<mdce options>
Operation Options and arguments of the mdce command, such as start, stop, etc. See the mdce reference page for a full list. The MATLAB installation folder on the remote hosts, required only if the remote installation folder differs from the one on the local machine. The names of the hosts where you want to run the mdce command. Separate the host names by commas without any white spaces. This is a mandatory argument. The platform of the remote hosts. This option is required only if different from the local platform. Prevent mdce from prompting the user for missing information. The command fails if all required information is not specified. Print help information.
-matlabroot <installfoldername>
-remotehost host1[,host2[,...]
-quiet
-help
6-11
remotemdce
Operation Force the usage of a particular protocol type. Specifying a protocol type with all its required parameters also avoids interactive prompting and allows for use in scripts. The supported protocol types are ssh, rsh, and winsc. To get more information about one particular protocol type, enter
remotemdce -protocol <type> -help
For example:
remotemdce -protocol winsc -help
Using the winsc protocol requires that you log in as a user with admin privileges on the remote host.
<protocol options>
Note If you are using OpenSSHd on a Microsoft Windows operating system, you can encounter a problem when using backslashes in path names for your command options. In most cases, you can work around this problem by using forward slashes instead. For example, to specify the file C:\temp\mdce_def.bat, you should identify it as C:/temp/mdce_def.bat.
Examples
Start mdce on three remote machines of the same platform as the client:
remotemdce start -remotehost hostA,hostB,hostC
6-12
remotemdce
Start mdce in a clean state on two UNIX operating system machines from a Windows operating system machine, using the ssh protocol. Enter the following command on a single line:
remotemdce start -clean -matlabroot /usr/local/matlab -remotehost unixHost1,unixHost2 -remoteplatform UNIX -protocol ssh
See Also
mdce | remotecopy
6-13
startjobmanager
The startjobmanager executable resides in the folder matlabroot\toolbox\distcomp\bin (Windows operating system) or matlabroot/toolbox/distcomp/bin (UNIX operating system). Enter the following command syntax at a DOS or UNIX command-line prompt, respectively.
startjobmanager -flags accepts the following input flags. Multiple flags can be used together on the same command.
Flag
-name <job_manager_name>
Operation Specifies the name of the job manager. This identifies the job manager to MATLAB worker sessions and MATLAB clients. The default is the value of the DEFAULT_JOB_MANAGER_NAME parameter in the mdce_def file. Specifies the name of the host where you want to start the job manager and the job manager lookup process. If omitted, they are started on the local host.
-remotehost <hostname>
6-14
startjobmanager
Flag
-clean
Operation Deletes all checkpoint information stored on disk from previous instances of this job manager before starting. This cleans the job manager so that it initializes with no jobs or tasks. Overrides the use of unicast to contact the job manager lookup process. It is recommended that you not use -multicast unless you are certain that multicast works on your network. This overrides the setting of JOB_MANAGER_HOST in the mdce_def file on the remote host, which would have the job manager use unicast. If this flag is omitted and JOB_MANAGER_HOST is empty, the job manager uses unicast to contact the job manager lookup process running on the same host. Specifies the base port that the mdce service on the remote host is using. You need to specify this only if the value of BASE_PORT in the local mdce_def file does not match the base port being used by the mdce service on the remote host. Verbose mode displays the progress of the command execution. Start the job manager MyJobManager on the local host.
startjobmanager -name MyJobManager
-multicast
-baseport <port_number>
-v
Examples
6-15
startjobmanager
See Also
6-16
startworker
The startworker executable resides in the folder matlabroot\toolbox\distcomp\bin (Windows operating system) or matlabroot/toolbox/distcomp/bin (UNIX operating system). Enter the following command syntax at a DOS or UNIX command-line prompt, respectively.
startworker -flags accepts the following input flags. Multiple flags
can be used together on the same command, except where noted. Flag
-name <worker_name>
Operation Specifies the name of the MATLAB worker. The default is the value of the DEFAULT_WORKER_NAME parameter in the mdce_def file. Specifies the name of the computer where you want to start the MATLAB worker. If omitted, the worker is started on the local computer. Specifies the name of the job manager this MATLAB worker will receive tasks from. The default is the value of the DEFAULT_JOB_MANAGER_NAME parameter in the mdce_def file.
-remotehost <hostname>
-jobmanager <job_manager_name>
6-17
startworker
Flag
-jobmanagerhost <job_manager_hostname>
Operation Specifies the host on which the job manager is running. The worker uses unicast to contact the job manager lookup process on that host to register with the job manager. This overrides the setting of JOB_MANAGER_HOST in the mdce_def file on the worker computer, which would also have the worker use unicast. Cannot be used together with -multicast. If you are certain that multicast works on your network, you can force the worker to use multicast to locate the job manager lookup process by specifying -multicast. Note: If you are using this flag to change the settings of and restart a stopped worker, then you should also use the -clean flag. Cannot be used together with
-jobmanagerhost.
-multicast
-clean
Deletes all checkpoint information associated with this worker name before starting. Specifies the base port that the mdce service on the remote host is using. You only need to specify this if the value of BASE_PORT in the local mdce_def file does not match the base port being used by the mdce service on the remote host. Verbose mode displays the progress of the command execution.
-baseport <port_number>
-v
6-18
startworker
Examples
Start a worker on the local host, using the default worker name, registering with the job manager MyJobManager on the host JMHost.
startworker -jobmanager MyJobManager -jobmanagerhost JMHost
Start a worker on the host WorkerHost, using the default worker name, and registering with the job manager MyJobManager on the host JMHost. (The following command should be entered on a single line.)
startworker -jobmanager MyJobManager -jobmanagerhost JMHost -remotehost WorkerHost
Start two workers, named worker1 and worker2, on the host WorkerHost, registering with the job manager MyJobManager that is running on the host JMHost. Note that to start two workers on the same computer, you must give them different names. (Each of the two commands below should be entered on a single line.)
startworker -name worker1 -remotehost WorkerHost -jobmanager MyJobManager -jobmanagerhost JMHost startworker -name worker2 -remotehost WorkerHost -jobmanager MyJobManager -jobmanagerhost JMHost
See Also
6-19
stopjobmanager
service. The stopjobmanager executable resides in the folder matlabroot\toolbox\distcomp\bin (Windows operating system) or matlabroot/toolbox/distcomp/bin (UNIX operating system). Enter the following command syntax at a DOS or UNIX command-line prompt, respectively.
stopjobmanager -flags accepts the following input flags. Multiple
Operation Specifies the name of the job manager to stop. The default is the value of
DEFAULT_JOB_MANAGER_NAME parameter the mdce_def file.
-remotehost <hostname>
Specifies the name of the host where you want to stop the job manager and the associated job manager lookup process. The default value is the local host. Deletes all checkpoint information stored on disk for the current instance of this job manager after stopping it. This cleans the job manager of all its job and task data.
-clean
6-20
stopjobmanager
Flag
-baseport <port_number>
Operation Specifies the base port that the mdce service on the remote host is using. You need to specify this only if the value of BASE_PORT in the local mdce_def file does not match the base port being used by the mdce service on the remote host. Verbose mode displays the progress of the command execution.
-v
Examples
See Also
6-21
stopworker
the mdce service. The stopworker executable resides in the folder matlabroot\toolbox\distcomp\bin (Windows operating system) or matlabroot/toolbox/distcomp/bin (UNIX operating system). Enter the following command syntax at a DOS or UNIX command-line prompt, respectively.
stopworker -flags accepts the following input flags. Multiple flags
Operation Specifies the name of the MATLAB worker to stop. The default is the value of the DEFAULT_WORKER_NAME parameter in the mdce_def file. Specifies the name of the host where you want to stop the MATLAB worker. The default value is the local host. Deletes all checkpoint information associated with this worker name after stopping it.
-remotehost <hostname>
-clean
6-22
stopworker
Flag
-baseport <port_number>
Operation Specifies the base port that the mdce service on the remote host is using. You need to specify this only if the value of BASE_PORT in the local mdce_def file does not match the base port being used by the mdce service on the remote host. Verbose mode displays the progress of the command execution.
-v
Examples
Stop the worker with the default name on the local host.
stopworker
Stop the worker with the default name, running on the computer WorkerHost.
stopworker -remotehost WorkerHost
Stop the workers named worker1 and worker2, running on the computer WorkerHost.
stopworker -name worker1 -remotehost WorkerHost stopworker -name worker2 -remotehost WorkerHost
See Also
6-23
stopworker
6-24
Glossary
Glossary
CHECKPOINTBASE The name of the parameter in the mdce_def file that defines the location of the job manager and worker checkpoint directories. checkpoint directory Location where job manager checkpoint information and worker checkpoint information is stored. client The MATLAB session that defines and submits the job. This is the MATLAB session in which the programmer usually develops and prototypes applications. Also known as the MATLAB client. client computer The computer running the MATLAB client. cluster A collection of computers that are connected via a network and intended for a common purpose. coarse-grained application An application for which run time is significantly greater than the communication time needed to start and stop the program. Coarse-grained distributed applications are also called embarrassingly parallel applications. codistributed array An array partitioned into segments, with each segment residing in the workspace of a different lab. Composite An object in a MATLAB client session that provides access to data values stored on the labs in a MATLAB pool, such as the values of variables that are assigned inside an spmd statement. computer A system with one or more processors.
Glossary-1
Glossary
distributed application The same application that runs independently on several nodes, possibly with different input parameters. There is no communication, shared data, or synchronization points between the nodes. Distributed applications can be either coarse-grained or fine-grained. DNS Domain Name System. A system that translates Internet domain names into IP addresses.
dynamic licensing The ability of a MATLAB worker or lab to employ all the functionality you are licensed for in the MATLAB client, while checking out only a server product license. When a job is created in the MATLAB client with Parallel Computing Toolbox software, the products for which the client is licensed will be available for all workers or labs that evaluate tasks for that job. This allows you to run any code on the cluster for which you are licensed on your MATLAB client, without requiring extra licenses for the worker beyond that for the MATLAB Distributed Computing Server product. For a list of products that are not eligible for use with Parallel Computing Toolbox software, see http://www.mathworks.com/products/ineligible_programs/. fine-grained application An application for which run time is significantly less than the communication time needed to start and stop the program. Compare to coarse-grained applications. head node Usually, the node of the cluster designated for running the job manager and license manager. It is often useful to run all the nonworker-related processes on a single machine. heterogeneous cluster A cluster that is not homogeneous. homogeneous cluster A cluster of identical machines, in terms of both hardware and software.
Glossary-2
Glossary
job
job manager The MathWorks process that queues jobs and assigns tasks to workers. A third-party process that performs this function is called a scheduler. The general term scheduler can also refer to a job manager. job manager checkpoint information Snapshot of information necessary for the job manager to recover from a system crash or reboot. job manager database The database that the job manager uses to store the information about its jobs and tasks. job manager lookup process The process that allows clients, workers, and job managers to find each other. It starts automatically when the job manager starts. lab When workers start, they work independently by default. They can then connect to each other and work together as peers, and are then referred to as labs.
LOGDIR The name of the parameter in the mdce_def file that defines the directory where logs are stored. MathWorks job manager See job manager. MATLAB client See client. MATLAB pool A collection of labs that are reserved by the client for execution of parfor-loops or spmd statements. See also lab.
Glossary-3
Glossary
MATLAB worker See worker. mdce The service that has to run on all machines before they can run a job manager or worker. This is the server foundation process, making sure that the job manager and worker processes that it controls are always running. Note that the program and service name is all lowercase letters. mdce_def file The file that defines all the defaults for the mdce processes by allowing you to set preferences or definitions in the form of parameter values. MPI Message Passing Interface, the means by which labs communicate with each other while running tasks in the same job. A computer that is part of a cluster.
node
parallel application The same application that runs on several labs simultaneously, with communication, shared data, or synchronization points between the labs. private array An array which resides in the workspaces of one or more, but perhaps not all labs. There might or might not be a relationship between the values of these arrays among the labs. random port A random unprivileged TCP port, i.e., a random TCP port above 1024. register a worker The action that happens when both worker and job manager are started and the worker contacts job manager.
Glossary-4
Glossary
replicated array An array which resides in the workspaces of all labs, and whose size and content are identical on all labs. scheduler The process, either third-party or the MathWorks job manager, that queues jobs and assigns tasks to workers. spmd (single program multiple data) A block of code that executes simultaneously on multiple labs in a MATLAB pool. Each lab can operate on a different data set or different portion of distributed data, and can communicate with other participating labs while performing the parallel computations. task One segment of a job to be evaluated by a worker.
variant array An array which resides in the workspaces of all labs, but whose content differs on these labs. worker The MATLAB process that performs the task computations. Also known as the MATLAB worker or worker process. worker checkpoint information Files required by the worker during the execution of tasks.
Glossary-5
Glossary
Glossary-6
Index
A
Index
startworker 6-17 stopjobmanager 6-20 stopworker 6-22 createSharedSecret control script 6-3
C
checkpoint directory definition Glossary-1 checkpoint folder locating 2-18
CHECKPOINTBASE
D
distributed application definition Glossary-2 DNS definition Glossary-2 dynamic licensing definition Glossary-2
definition Glossary-1 clean state starting services 2-16 client definition Glossary-1 process 1-5 client computer definition Glossary-1 cluster definition Glossary-1 coarse-grained application definition Glossary-1 Composite definition Glossary-1 computer definition Glossary-1 configuring MATLAB Distributed Computing Server 2-5 control scripts admincenter 6-2 createSharedSecret 6-3 customizing 2-13 defaults 2-13 mdce 6-4 nodestatus 6-6 remotecopy 6-8 remotemdce 6-11 startjobmanager 6-14
F
fine-grained application definition Glossary-2
H
head node definition Glossary-2 heterogeneous cluster definition Glossary-2 support 1-8 homogeneous cluster definition Glossary-2
I
installing MATLAB Distributed Computing Server 2-5
J
job definition Glossary-3 job manager checkpoint information definition Glossary-3
Index-1
Index
database definition Glossary-3 definition Glossary-3 logs 2-17 lookup process definition Glossary-3 multiple on one machine 2-14 stopping on UNIX or Macintosh 2-9 on Windows 2-11
N
network administration 2-1 layout 2-2 preparation 2-2 requirements 2-3 security 2-4 node definition Glossary-4 nodestatus control script 6-6
L
lab definition Glossary-3 log files locating 2-17
LOGDIR
P
parallel application definition Glossary-4 parallel computing products server 1-5 toolbox 1-5 version 1-4 Parallel Computing Toolbox using 1-9 platforms supported 1-8
definition Glossary-3
M
MathWorks job manager. See job manager MATLAB client definition Glossary-3 MATLAB pool definition Glossary-3 MATLAB worker definition Glossary-4 mdce (service) definition Glossary-4 mdce control script 6-4 mdce_def file definition Glossary-4 MJS process 1-5 versus third-party scheduler 1-7 MPI definition Glossary-4
R
random port definition Glossary-4 register a worker definition Glossary-4 remotecopy control script 6-8 remotemdce control script 6-11 requirements 2-3
S
scheduler definition Glossary-5 third-party 1-6 security 2-4
Index-2
Index
spmd definition Glossary-5 startjobmanager control script 6-14 startworker control script 6-17 stopjobmanager control script 6-20 stopworker control script 6-22
U
user setting 2-14
W
worker definition Glossary-5 process 1-5 worker checkpoint information definition Glossary-5 workers logs 2-17 stopping on UNIX or Macintosh 2-9 on Windows 2-11
T
task definition Glossary-5 third-party scheduler 1-6 versus MJS 1-7 troubleshooting license errors 2-23 memory errors 2-25 verifying multicast 2-29 Windows network installation 2-25
Index-3