multi-GPU ,GPUDIRECT question

Cfdlearner · July 4, 2023, 7:44am

I have a question regarding my recent experience with running (SPH) simulations on multiple GPUs. While exploring various options, I came across GPUSPH, an excellent platform for such computations. In an attempt to optimize performance, I ran the simulation using two NVIDIA graphics cards, 3080 and 3070, connected to the same PC. However, I encountered an issue wherein the speed of the 3080 seemed to outperform the combined performance of both GPUs when I initiated the simulation using “–gpudirect 0,1” command. This outcome left me wondering if there are any specific steps or considerations to be taken into account when utilizing multiple GPUs within the same system.I would greatly appreciate any guidance or advice you can provide on optimizing the performance of multiple GPUs in a shared computing environment.

giuseppe.bilotta · July 4, 2023, 9:33am

Hello @Cfdlearner, thanks for looking into GPUSPH.

First of all, GPUDirect should not be relevant in your case, given that it only affects multi-node multi-GPU usage, while you mention that your GPUs are both connected to the same machine.

The first thing you should look into is to enable striping (--striping command line option), that enables concurrent computations and data transfers.

Another important aspect to consider is how the domain is being split across the two GPUs, and how this relates to the linearization used when indexing the cells. By default GPUSPH will split the domain along the longest axis (test cases can override this by providing a fillDeviceMap member function). It’s useful if the linearization follows the split. You can check if the linearization is good in you case by looking at the message about “data transfers compacted in X bursts” printed right before the peer accessibility table. With two GPUs, there should only be 1 burst per device if the linearization is optimal. Something like xzy or yzx is often a good choice.

You can change the linearization used by compiling with make linearization=your choice here

Cfdlearner · July 6, 2023, 1:59pm

i have attached my configurations and i still get same results 1 gpu is behaving better even after calling for 2 gpus to work together “–striping” how may i solve this

giuseppe.bilotta · July 6, 2023, 2:22pm

Hello @Cfdlearner, thanks for reporting back. I think in your case there are three aspects to consider:

the test case you’re trying uses FEA, which relies on Chrono and only uses the CPU; this is going to limit scaling efficiency;
from your log, it seems that your GPUs cannot peer directly, so to exchange data they have to go through the host, which also limits scaling efficiency;
the number of particles in the test case you are running is low: the multi-GPU benefits become more apparent as the test cases grow larger.

For testing, could you try the DamBreak3D test case first? Try checking the difference in runtime between one and two GPUs with --ppH 32 (it probably won’t make much of a difference) versus when running with --ppH 64 (you should see much higher performance).

Cfdlearner · July 8, 2023, 5:44am

@giuseppe.bilotta thank you for your guidance it worked well