Cluster Computing
A computer cluster typically consists of compute nodes with the same hardware configuration, and they are communicating with each other over a very fast interconnection (e.g. InfiniBand). With a floating license, it is possible to use GeoDict with its GUI on such clusters but typically a simulation script is submitted into a job queue management system.
A job queue scheduler is a computer application for controlling unattended background program execution of jobs. This is commonly called batch scheduling, as execution of non-interactive jobs is often called batch processing. Two commonly used job schedulers are:
- Portable Batch System (PBS): a computer software that performs job scheduling. Its primary task is to allocate computational tasks, i.e., batch jobs, among the available computing resources. It is often used in conjunction with UNIX cluster environments
- Slurm Workload Manager (SLURM): a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters.
Both schedulers can be used to submit and perform GeoDict simulation jobs.

|
Important! You can only make use of multiple compute nodes on a distributed memory cluster for the following solvers: EJ (flow, conduction, diffusion), SimpleFFT (flow, conduction, diffusion), FeelMath (stiffness) and Tracker (particulate flow).
All structure generation commands and any command using the LIR or BEST solver cannot use distributed memory, and thus can only make use of a single node on a distributed memory cluster!
|
GeoDict needs to be installed on each compute node or has to be installed on a shared file systems such that each compute node can access it. A floating license installed on a license server is needed and each compute node must have access to the license server. Node-locked licenses do not work for cluster computing.
Three steps are needed to start large simulations on a Linux cluster:
1. Enable password-less login on cluster compute nodes
When using SSH to login to (other) cluster nodes, the system typically asks for a password. This is bothersome for multi-process job startup procedures. However, the SSH configuration can be changed to allow password-less login to cluster compute nodes. The script enablePasswordLessLogin.sh configures that for you:
- Switch to the GeoDict installation folder.
- Execute ./enablePasswordLessLogin.sh
This script has to be executed when logged in to your Linux account on the cluster. It can also be done on your “local” Linux computer, if your local computer and the cluster share the same account. After execution of the script, it is possible to login to cluster node without password input.
|
2. Prepare a cluster simulation script
A submission shell script is needed to start a simulation on a cluster. This shell script contains control information for the job submission system (e.g. number of nodes) and calls GeoDict with a floating license and a simulation script. A template for such a script is available in the GeoDict installation folder (Linux version only) with the name :
- PBSClusterSimulationTemplate.sh for PBS, and
- SLURMClusterSimulationTemplate.sh for SLURM.
The following description considers PBS but is very similar for SLURM.
- Adjust the following lines of the submission script according to your GeoDict installation and cluster settings:
- Line 20: Maximum runtime of the simulation. If the simulation last longer than the specified runtime than the job is cancelled.
- Line 21: Number of nodes and processes per nodes (ppn).
- Line 30: Path to your MPI installation.
- Line 33: Path to your GeoDict installation.
- Line 36: Path to your GeoDict floating license.
- Line 39: Path to your simulation script.
- Create a simulation script that performs the simulation. Make sure that the parallelization settings in the simulation script use the cluster, e.g.:
|
3. Submit a simulation on the cluster
- Login to the master node of the cluster (e.g. with putty or SSH)
- Change the working directory to the submission shell script.
- Make sure that enough disk space is available for temporary saving of flow fields
- For PBS: Start the simulation with the command:
qsub <Path to Shell Script Folder>/ PBSClusterSimulationTemplate .sh
- For SLURM: Start the simulation with the command:
sbatch <Path to Script Folder>/ SLURMClusterSimulationTemplate.sh
- A log file for the simulation run is created with the name GeoDictClusterSimulation.out
|