Issues with new version of CUDA (11.2)

brittlyd · January 14, 2021, 8:11pm

Hello,

I’ve been running tests with GPUSPH for the last year just fine with a previous version of CUDA (I think it was 11.0). However, in the last week I had a graphics card driver issue and had to reinstall CUDA 11.2 (just released December 2020). Now when I try to run problems (I tried to make WaveTank), I get the following error:

[CONF] make showmake: *** No rule to make target ‘/usr/local/cuda/bin/…/targets/x86_64-linux/include/thrust/detail/type_traits/algorithm/intermediate_type_from_function_and_iterators.h’, needed by ‘build/problems/WaveTank.o’. Stop.

I tried installing 11.0, but I think there’s a driver compatibility issue that I haven’t quite figured out.

Any suggestions for this error?

Thank you so much!

brittlyd · January 15, 2021, 7:33pm

An update: I reinstalled GPUSPH (and still have CUDA 11.2), and now WaveTank compiles, but when I try to run it I get the following error:

./WaveTank: error while loading shared libraries: libcudart.so.11.0: cannot open shared object file: No such file or directory

I was able to find libcudart.so.11.0 located in /usr/local/cuda/lib64 so I’m not sure if it’s just an issue of GPUSPH not finding it or if something went wrong with the install.

I’m quite confused, any help appreciated!

giuseppe.bilotta · January 16, 2021, 3:31pm

Hello @brittlyd

I suspect that on your system there might be a mixup between the two versions of CUDA you have installed. Can you please provide the output of make show?

brittlyd · January 21, 2021, 10:08pm

Hi @giuseppe.bilotta,

I basically reinstalled everything so now I am using CUDA 11.1 and GPUSPH v5.0 on Ubuntu 20.04. There are no other versions of CUDA installed, as I did a fresh install of Ubuntu. Now, I can get the script to compile, but when I run it, this is what it outputs:

Starting workers...
number of forces rigid bodies particles = 0
Device 0 thread 7fea031c8000 iteration 0 last command: 0 (IDLE). Exception: src/cuda/cudautil.cc(41) : in checkCUDA() @ thread 0x140643051274240 : cudaSafeCallNoSync() runtime API error 101 : invalid device ordinal
src/GPUWorker.cc(1106) : in deallocateDeviceBuffers() @ thread 0x140643051274240 : cudaSafeCall() runtime API error 101 : invalid device ordinal
FATAL: GPUSPH: problem during initialization
terminate called without an active exception

And the results of make show are:
[CONF] make showGPUSPH version: v5.0
Platform: Linux
Architecture: x86_64
Current dir: /home/custer/gpusph
This Makefile: /home/custer/gpusph/Makefile
Used Makefiles: Makefile Makefile.conf dep/HDF5SphReader.d dep/predcorr_alloc_policy.d dep/Integrator.d dep/Writer.d dep/simframework.d dep/command_type.d dep/GPUWorker.d dep/pugixml.d dep/Synchronizer.d dep/ProblemCore.d dep/main.d dep/vector_print.d dep/Options.d dep/Reader.d dep/GPUSPH.d dep/base64.d dep/VTUReader.d dep/ParticleSystem.d dep/buffer_traits.d dep/debugflags.d dep/XYZReader.d dep/cuda/cudautil.d dep/geometries/Vector.d dep/geometries/STLMesh.d dep/geometries/Cylinder.d dep/geometries/Point.d dep/geometries/TopoCube.d dep/geometries/Object.d dep/geometries/Disk.d dep/geometries/EulerParameters.d dep/geometries/Cone.d dep/geometries/Cube.d dep/geometries/Sphere.d dep/geometries/Torus.d dep/geometries/Rect.d dep/geometries/Plane.d dep/integrators/RepackingIntegrator.d dep/integrators/PredictorCorrectorIntegrator.d dep/problem_api/ProblemAPI_1.d dep/writers/UDPWriter.d dep/writers/CustomTextWriter.d dep/writers/CallbackWriter.d dep/writers/CommonWriter.d dep/writers/HotFile.d dep/writers/VTKWriter.d dep/writers/VTKLegacyWriter.d dep/writers/TextWriter.d dep/writers/HotWriter.d dep/NetworkManager.d dep/problems/StillWater.d dep/problems/WaveTank.d dep/problems/DamBreak3D.d dep/StillWater.gen.d dep/WaveTank.gen.d dep/DamBreak3D.gen.d
Problem:
Linearization: yzx
Snapshot file: ./GPUSPH-v5.0-2019-06-13.tgz
Last problem: WaveTank
Sources dir: src src/adaptors src/cuda src/geometries src/integrators src/problem_api src/problems src/writers
Options dir: options
Objects dir: build build/adaptors build/cuda build/geometries build/integrators build/problem_api build/problems build/problems/user build/writers
Scripts dir: scripts
Docs dir: docs
Doxygen conf:
Verbose:
Debug: 0
CXX: g++
CXX version: g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
MPICXX: g++
nvcc: /usr/local/cuda/bin/nvcc -ccbin=g++
nvcc version: 11.1
LINKER: /usr/local/cuda/bin/nvcc -ccbin=g++
Compute cap.: 75
Fastmath: 0
USE_MPI: 0
USE_HDF5: 0
USE_CHRONO: 0
default paths: /usr/include/c++/9 /usr/include/x86_64-linux-gnu/c++/9 /usr/include/c++/9/backward /usr/lib/gcc/x86_64-linux-gnu/9/include /usr/local/include /usr/include/x86_64-linux-gnu /usr/include
INCPATH: -Isrc -Isrc/adaptors -Isrc/cuda -Isrc/geometries -Isrc/integrators -Isrc/problem_api -Isrc/problems -Isrc/writers -Isrc/problems -Isrc/problems/user -Ioptions
LIBPATH: -L/usr/local/lib -L/usr/local/cuda/lib64
LIBS: -lcudart -lpthread -lrt
LDFLAGS: -L/usr/local/lib -L/usr/local/cuda/lib64 -arch=sm_75
CPPFLAGS: -Isrc -Isrc/adaptors -Isrc/cuda -Isrc/geometries -Isrc/integrators -Isrc/problem_api -Isrc/problems -Isrc/writers -Isrc/problems -Isrc/problems/user -Ioptions -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS -D_GLIBCXX_USE_C99_MATH -DUSE_HDF5=0 -D__COMPUTE__=75
CXXFLAGS: -m64 -std=c++11 -O3
CUFLAGS: -arch=sm_75 --generate-line-info -std=c++11 --compiler-options -m64,-O3

giuseppe.bilotta · January 22, 2021, 7:10am

Hello @brittlyd

The error “invalid device ordinal” indicates that GPUSPH cannot find the requested device. This may mean that either you specified a non-existing device (e.g. by setting GPUSPH_DEVICE or by specifying the --device command-line parameter), or that that device is unavailable. Can you double-check with nvidia-smi that the GPUs are available, and that the examples from the CUDA SDK work correctly?

brittlyd · January 22, 2021, 8:22pm

Hi @giuseppe.bilotta,

I think there was a mistake when I was removing two of my three graphics cards yesterday. I think now the graphics cards are installed properly and I’m not getting that error anymore. However, now I get the following error:

Entering the main simulation cycle
Simulation time t=0.000000e+00s, iteration=0, dt=1.000000e-04s, 55,124 parts (0, cum. 0 MIPPS), maxneibs 74+0
Device 0 thread 7fe061cb9000 iteration 0 last command: 43 (FORCES_SYNC). Exception: src/cuda/forces.cu(509) : in unbind_textures() @ thread 0x140601690132480 : cudaSafeCall() runtime API error 700 : an illegal memory access was encountered
GPUSPH aborted by worker thread
Elapsed time of simulation cycle: 0.028s
Peak particle speed was ~0 m/s at 0 s -> can set maximum vel 0 for this problem
Simulation end, cleaning up...
src/GPUWorker.cc(1106) : in deallocateDeviceBuffers() @ thread 0x140601690132480 : cudaSafeCall() runtime API error 700 : an illegal memory access was encountered
Deallocating...

And the results of make show are:

[CONF] make showGPUSPH version:  v5.0                             
Platform:        Linux
Architecture:    x86_64
Current dir:     /home/custer/gpusph
This Makefile:   /home/custer/gpusph/Makefile
Used Makefiles:   Makefile Makefile.conf dep/HDF5SphReader.d dep/predcorr_alloc_policy.d dep/Integrator.d dep/Writer.d dep/simframework.d dep/command_type.d dep/GPUWorker.d dep/pugixml.d dep/Synchronizer.d dep/ProblemCore.d dep/main.d dep/vector_print.d dep/Options.d dep/Reader.d dep/GPUSPH.d dep/base64.d dep/VTUReader.d dep/ParticleSystem.d dep/buffer_traits.d dep/debugflags.d dep/XYZReader.d dep/cuda/cudautil.d dep/geometries/Vector.d dep/geometries/STLMesh.d dep/geometries/Cylinder.d dep/geometries/Point.d dep/geometries/TopoCube.d dep/geometries/Object.d dep/geometries/Disk.d dep/geometries/EulerParameters.d dep/geometries/Cone.d dep/geometries/Cube.d dep/geometries/Sphere.d dep/geometries/Torus.d dep/geometries/Rect.d dep/geometries/Plane.d dep/integrators/RepackingIntegrator.d dep/integrators/PredictorCorrectorIntegrator.d dep/problem_api/ProblemAPI_1.d dep/writers/UDPWriter.d dep/writers/CustomTextWriter.d dep/writers/CallbackWriter.d dep/writers/CommonWriter.d dep/writers/HotFile.d dep/writers/VTKWriter.d dep/writers/VTKLegacyWriter.d dep/writers/TextWriter.d dep/writers/HotWriter.d dep/NetworkManager.d dep/problems/StillWater.d dep/problems/WaveTank.d dep/problems/DamBreak3D.d dep/StillWater.gen.d dep/WaveTank.gen.d dep/DamBreak3D.gen.d
Problem:         
Linearization:   yzx
Snapshot file:   ./GPUSPH-v5.0-2019-06-13.tgz
Last problem:    WaveTank
Sources dir:     src src/adaptors src/cuda src/geometries src/integrators src/problem_api src/problems src/writers
Options dir:     options
Objects dir:     build build/adaptors build/cuda build/geometries build/integrators build/problem_api build/problems build/problems/user build/writers
Scripts dir:     scripts
Docs dir:        docs
Doxygen conf:    
Verbose:         
Debug:           0
CXX:             g++
CXX version:     g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
MPICXX:          g++
nvcc:            /usr/local/cuda/bin/nvcc -ccbin=g++
nvcc version:    11.1
LINKER:          /usr/local/cuda/bin/nvcc -ccbin=g++
Compute cap.:    75
Fastmath:        0
USE_MPI:         0
USE_HDF5:        0
USE_CHRONO:      0
default paths:   /usr/include/c++/9 /usr/include/x86_64-linux-gnu/c++/9 /usr/include/c++/9/backward /usr/lib/gcc/x86_64-linux-gnu/9/include /usr/local/include /usr/include/x86_64-linux-gnu /usr/include
INCPATH:          -Isrc -Isrc/adaptors -Isrc/cuda -Isrc/geometries -Isrc/integrators -Isrc/problem_api -Isrc/problems -Isrc/writers -Isrc/problems -Isrc/problems/user -Ioptions
LIBPATH:          -L/usr/local/lib -L/usr/local/cuda/lib64
LIBS:             -lcudart -lpthread -lrt
LDFLAGS:           -L/usr/local/lib -L/usr/local/cuda/lib64 -arch=sm_75
CPPFLAGS:          -Isrc -Isrc/adaptors -Isrc/cuda -Isrc/geometries -Isrc/integrators -Isrc/problem_api -Isrc/problems -Isrc/writers -Isrc/problems -Isrc/problems/user -Ioptions -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS -D_GLIBCXX_USE_C99_MATH -DUSE_HDF5=0 -D__COMPUTE__=75
CXXFLAGS:         -m64 -std=c++11 -O3
CUFLAGS:          -arch=sm_75 --generate-line-info -std=c++11 --compiler-options -m64,-O3

This was the same error I was getting before about illegal memory access. I ran a check on the GPU memory and it passed the tests I ran, so I don’t think it’s a hardware issue (although I’m still trying to run a few more tests).

I’m sorry for the constantly changing problems, I really appreciate your help!

giuseppe.bilotta · January 25, 2021, 7:25am

No problem, this is actually an issue with GPUSPH and the latest versions of CUDA. There is a bug in Thrust that prevents the sort phase from completing correctly, and this leads to issues in the subsequent computations. This (together with a bunch of compile-time warnings that also appear with CUDA 11) should be fixed in the next branch of GPUSPH. Could you please try that?