GPU Hardware Requirements, Compatability, and Performance

GPUs, like most modern digital technologies, are constantly evolving and it’s difficult to keep up with what’s available on the market. Furthermore, the numerous options makes hardware inter-comparison difficult, particularly when knowledge is limited on the subject. While there are hardware reviews and comparisons found online, they are often directed towards gaming performance, which isn’t an “apples to apples” comparison with scientific computing. That being said, I wanted to create a new topic to draw some attention to hardware requirements, comparability, and performance.

I realize GPUSPH is software, but it relies on dedicated GPU hardware and I felt there should at least be some active documentation reflecting the available hardware and how GPUSPH can make optimal use of the hardware. I can see the documentation contains very general information on GPU hardware requirements, which is certainly helpful, but I’m guessing we can do better as a community having a dedicated software application.

Where is this coming from? Well, I’m going to be starting a project that utilizes GPUSPH and I need to build a new workstation with GPU support. My approach is to start with picking the proper GPU, the build my system around it. As outlined above, the numerous choices is daunting. While I would like to ask a few specific questions, it is my hope that developers will take into consideration the points I’ve outlined above and maybe actively monitor hardware developments as well.

First off…I’m likely going with Nvidia graphics card. Which one, I’m not sure? I’ve done some homework and it seems that the software should dictate with card type you need. For example, single vs double precision? ECC memory? These features make a significant price difference. Are either of these two things important to GPUSPH? In the documentation I did notice that GeForce cards were used, which seems to indicate that GPUSPH does not require ECC or double precision? Are there other hardware features that GPUSPH does or doesn’t require? Are there hardware features that GPUSPH can utilize to improve performance? I see a lot of stuff over on Nvidia’s website about NVLink, which seems like a nice feature, but I don’t know if GPUSPH can utilize this feature or not…maybe it doesn’t matter at the application level? What’s new with this " NVIDIA TITAN RTX"? Does GPUSPH support the “Turing” architecture? What are these “Tensor cores” and does GPUSPH make use of these?

Obviously I could go on asking all kinds of questions. The point is, the hardware sales of GPU cards generally begins with a question along the lines of “what software will you be running on it” and I feel the GPUSPH community should have better support in this area.

Hello @GWAVE, and thanks for your interest in GPUSPH

I agree with you that it’s quite useful to have a more detail specification of which hardware features that are respectively needed, relevant and irrelevant to GPUSPH. I’ll start with a quick list here, and we’ll look into integrating it more cleanly in the documentation.

  • NVIDIA GPU; since we use CUDA for the device side, only NVIDIA GPUs are supported; other hardware manufacturers are not supported (yet);
  • single-precision only; we intentionally avoid the use of double precision, so the double-precision computational capability of the GPU is irrelevant for GPUSPH;
  • we do not use the tensor cores;
  • we do not use the ray-tracing cores;
  • ECC is not required; GPUSPH will benefit from ECC memory if present, since ECC will be able to detect and in some cases fix hardware-related memory errors, but this is independent from coding issues;
  • NVLINK is not required; GPUSPH will benefit from NVLINK if present (faster transfer rates between GPUs and from/to host and GPUs);
  • hardware generation: anything supported by the CUDA version you’re using; with GPUSPH version 5 we rely on the C++11 features that are only fully supported from CUDA 8 onwards, which may limit support for very old architectures (Compute Capability 1 (Tesla) and 2 (Fermi));
  • the latest hardware generations (Volta and Turing) should work, but we did no architecture-specific optimizations on them;
  • since we do not require any specific workstation or higher class feature, the choice of GPU class (Tesla vs Quadro vs GeForce) is entirely up the user based on budget considerations vs performance requirements (e.g. it might be preferrable to buy 2 TITAN cards instead of a Tesla of the same generation).