Calculations can be parallelized if the user’s license and hardware allow it. The Parallelization Options dialog opens when clicking the Edit button, to choose between Sequential, Parallel (Shared Memory), Automatic Number of Threads, and Cluster.
When Sequential is selected only one thread is used for the computation.
Selecting Parallel (Shared Memory) runs the parallel computation with the specified Number of Processes on a shared-memory hardware. Then the maximum number of available processors and the maximum number of licensed parallel processes is shown in the dialog.
If Automatic Number of Threads is selected, the number of parallel processes is automatically selected for optimal speed, based on the CPU cores and licensed parallel processes. For up to eight Available Threads, all of them will be used. If more than eight threads are available, two cases might occur:
- The number of Available Threads is larger (or equal) than the Number of CPU Cores:
- Then the maximum of eight and Number of CPU Cores divided by 2 is used.
- The number of Available Threads is smaller than the Number of CPU Cores:
- Then the maximum of eight and number of Available Threads divided by 2 is used.
The parallelization of the solvers is done with two technical methods: MPI parallel or thread parallel. The following table shows the support of both parallelization methods:
|
Parallelization method
|
Solver
|
MPI Parallel
|
Thread Parallel
|
EJ
|
✔
|
✖
|
SimpleFFT
|
✔
|
✖
|
LIR
|
✖
|
✔
|
The Cluster parallelization requires that the solver supports the MPI parallelization method. The Parallel (Shared Memory) parallelization can be done with MPI parallelization or Thread parallelization. Thus, the LIR solver does not support Cluster parallelization.
The choice of Cluster is for users of Linux systems only. For details on how to set up and run parallel computations, consult the High Performance Computations User Guide.
Three examples of parallelization benchmark results for FlowDict are shown here. Both benchmark computations were run on our server, with 2 x Intel E5-2697A v4 processors with 16 cores each, running with a maximum of 3.60 GHz, and 1,024 GB RAM.
Sandstone Benchmark
The first example is the computation of the flow through a synthetic X-ray tomography data of a sandstone structure (Mattila et al., 2016). The structure has a porosity of 13.5% and a size of 2,048 x 2,048 x 2,032 voxels.
Together with an inlet and outlet of 8 voxels each, the flow computation is performed on a structure size of 2,0483. This structure size is optimal to speed up runtime, since 2,048 is a power of two. The smaller the factors in the prime factorization of the number of voxels in each coordinate direction, the better for the runtime of SimpleFFT.
The flow through the structure is computed by solving the Stokes equation with the LIR solver (speed optimization and memory optimization) and with the SimpleFFT solver. The following figure shows the runtime for a different number of processes to compute the permeability of 3.3e-13 m2 with an error bound of 0.01 with both solvers. Runtime and speedup of the parallelization are better for the LIR solver. For both solvers, the ideal speedup, i.e., getting half of the runtime for twice of the number of processes is also shown. The computation with the LIR solver is close to the ideal speedup, which is usually not possible to reach for real life examples.
|
Filter Benchmark
The second benchmark is the computation of the flow through a filter structure with linear increasingly fiber density.
The structure size is again 2,048 x 2,048 x 2,032 voxels, with an inlet and an outlet of 8 voxels (i.e., in total 8.6 billion voxels). So, like for the sandstone structure, the size is optimal to get a short runtime even for a large structure. The porosity of 94.2% is much larger than for the sandstone structure example.
The flow through the structure is computed by solving the Stokes equation, with the LIR (speed optimization) and the SimpleFFT solver.
The following figure shows the runtime for a different number of processes to compute the permeability of 5.63e-11 m2 for SimpleFFT and 5.69e-11 m2 for LIR. Both results are equal within the chosen error bound of 1%. The benefit of using the LIR solver compared to SimpleFFT is large for the structure with high porosity. The computation with the LIR solver is close to the ideal speedup, which is usually not possible to reach for real life examples.
|
Multilayer Weave Benchmark
The third benchmark is the computation of fast airflow through a multilayer weave. The structure has a size of 400 x 800 x 1024 including 150 voxels inlet and 800 voxels outlet (i.e., in total ~300 million voxels).
The flow through the structure is computed by solving the Navier-Stokes equation, with the LIR (speed optimization) and the SimpleFFT solver with and without using Tdma. The Velocity Inlet, Pressure outlet boundary condition is used and the average flow velocity is 3 m/s.
The SimpleFFT solver with using Tdma reduces the runtime in this benchmark by up to 50%.
|
Memory requirements for the examples and both solvers are listed in the following table: