Tests not working

Manuel · November 6, 2020, 4:18pm

Hello,

I have just installed GPUSPH and tried to run some tests of the src/problems folder. However I get warnings and the results at the end of the simulations are wrong.
For example, regarding the DamBreak3D problem this is what I get:

Simulation time t=4.523146e-02s, iteration=186, dt=2.659091e-04s, 84,444 parts (15, cum. 14 MIPPS), maxneibs 122+0
Simulation time t=5.006359e-02s, iteration=208, dt=2.017437e-04s, 84,444 parts (15, cum. 14 MIPPS), maxneibs 124+0
Simulation time t=5.512060e-02s, iteration=231, dt=2.659091e-04s, 84,444 parts (15, cum. 14 MIPPS), maxneibs 124+0
Simulation time t=6.015487e-02s, iteration=251, dt=2.424981e-04s, 84,444 parts (14, cum. 14 MIPPS), maxneibs 124+0
Simulation time t=6.505197e-02s, iteration=271, dt=2.659091e-04s, 84,444 parts (13, cum. 14 MIPPS), maxneibs 125+0
WARNING: at iteration 280 the number of particles changed from 84444 to 84443 for no known reason!
WARNING: at iteration 280, time 0.0671853 particle ID 0 is at indices 0 and 1!
WARNING: at iteration 280, time 0.0671853 particle ID 1 was not found!
Recap of devices after roll call:

device at index 0 has 84,443 particles assigned and offset 0
Simulation time t=7.002765e-02s, iteration=293, dt=2.659091e-04s, 84,443 parts (15, cum. 14 MIPPS), maxneibs 125+0
WARNING: at iteration 310 the number of particles changed from 84443 to 84442 for no known reason!
WARNING: at iteration 310, time 0.0737294 particle ID 67044 was not found!
Recap of devices after roll call:
device at index 0 has 84,442 particles assigned and offset 0
Simulation time t=7.523180e-02s, iteration=317, dt=1.200993e-04s, 84,442 parts (16, cum. 14 MIPPS), maxneibs 125+0
WARNING: at iteration 330 the number of particles changed from 84442 to 84439 for no known reason!
Simulation time t=8.005969e-02s, iteration=345, dt=1.771645e-04s, 84,439 parts (16, cum. 14 MIPPS), maxneibs 125+0
WARNING: at iteration 350 the number of particles changed from 84439 to 84434 for no known reason!

WARNING: at iteration 1930 the number of particles changed from 71562 to 71512 for no known reason!
WARNING: current max. neighbors numbers (132 | 0) greater than max possible neibs (127 | 0) at iteration 1930
possible culprit: 1216 (neibs: 132 + 0 | 0)
WARNING: at iteration 1940 the number of particles changed from 71512 to 71487 for no known reason!
WARNING: current max. neighbors numbers (131 | 0) greater than max possible neibs (127 | 0) at iteration 1940
possible culprit: 960 (neibs: 131 + 0 | 0)
WARNING: particle 0 (id 0, type 0) has NAN position! (nan, nan, nan) @ (0, 0, 0) = (nan, nan, nan) at iteration 1941, time 0.330071
Simulation time t=3.300714e-01s, iteration=1,941, dt=2.085889e-04s, 71,487 parts (20, cum. 20 MIPPS), maxneibs 180+0
WARNING: at iteration 1950 the number of particles changed from 71487 to 71441 for no known reason!
WARNING: particle 0 (id 0, type 0) has NAN position! (nan, nan, nan) @ (0, 0, 0) = (nan, nan, nan) at iteration 1950, time 0.331993
WARNING: at iteration 1960 the number of particles changed from 71441 to 71387 for no known reason!
WARNING: particle 0 (id 0, type 0) has NAN position! (nan, nan, nan) @ (0, 0, 0) = (nan, nan, nan) at iteration 1967, time 0.335032
Simulation time t=3.350320e-01s, iteration=1,967, dt=2.200599e-04s, 71,387 parts (22, cum. 20 MIPPS), maxneibs 180+0
WARNING: at iteration 1970 the number of particles changed from 71387 to 71367 for no known reason!
WARNING: particle 0 (id 0, type 0) has NAN position! (nan, nan, nan) @ (0, 0, 0) = (nan, nan, nan) at iteration 1970, time 0.335692
WARNING: at iteration 1980 the number of particles changed from 71367 to 71351 for no known reason!
WARNING: particle 0 (id 0, type 0) has NAN position! (nan, nan, nan) @ (0, 0, 0) = (nan, nan, nan) at iteration 1990, time 0.340093
Simulation time t=3.400934e-01s, iteration=1,990, dt=2.200599e-04s, 71,351 parts (20, cum. 20 MIPPS), maxneibs 180+0
WARNING: at iteration 1990 the number of particles changed from 71351 to 71332 for no known reason!
WARNING: particle 0 (id 0, type 0) has NAN position! (nan, nan, nan) @ (0, 0, 0) = (nan, nan, nan) at iteration 1990, time 0.340093
WARNING: at iteration 2000 the number of particles changed from 71332 to 71313 for no known reason!
WARNING: at iteration 2010 the number of particles changed from 71313 to 71312 for no known reason!
WARNING: particle 0 (id 0, type 0) has NAN position! (nan, nan, nan) @ (0, 0, 0) = (nan, nan, nan) at iteration 2014, time 0.345084
Simulation time t=3.450835e-01s, iteration=2,014, dt=2.200599e-04s, 71,312 parts (21, cum. 20 MIPPS), maxneibs 180+0
WARNING: at iteration 2020 the number of particles changed from 71312 to 71273 for no known reason!
WARNING: particle 0 (id 0, type 0) has NAN position! (nan, nan, nan) @ (0, 0, 0) = (nan, nan, nan) at iteration 2020, time 0.346147
WARNING: at iteration 2030 the number of particles changed from 71273 to 71202 for no known reason!
WARNING: particle 0 (id 0, type 0) has NAN position! (nan, nan, nan) @ (0, 0, 0) = (nan, nan, nan) at iteration 2038, time 0.350108
Simulation time t=3.501078e-01s, iteration=2,038, dt=2.200599e-04s, 71,202 parts (20, cum. 20 MIPPS), maxneibs 180+0

Basically, the particles slowly flow out of the domain.

Do you have any idea why this happens? Could it be due to my GPU/CUDA which are not properly working?
Here the GPU/CUDA details:

CUDA Device Query (Driver API) statically linked version
Detected 1 CUDA Capable device(s)

Device 0: “Tesla V100-PCIE-32GB”
CUDA Driver Version: 11.1
CUDA Capability Major/Minor version number: 7.0
Total amount of global memory: 32510 MBytes (34089730048 bytes)
(80) Multiprocessors, ( 64) CUDA Cores/MP: 5120 CUDA Cores
GPU Max Clock rate: 1380 MHz (1.38 GHz)
Memory Clock rate: 877 Mhz
Memory Bus Width: 4096-bit
L2 Cache Size: 6291456 bytes
Max Texture Dimension Sizes 1D=(131072) 2D=(131072, 65536) 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Texture alignment: 512 bytes
Maximum memory pitch: 2147483647 bytes
Concurrent copy and kernel execution: Yes with 7 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Enabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 59 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Result = PASS

Thanks for the help.

Regards

Manuel

saikali · November 7, 2020, 12:22pm

Hello Manuel,
Have you changed anything in the file DamBreakD.cu file ?

Regards
Elie

Manuel · November 8, 2020, 11:36am

Hi Elie,

thanks for your reply.

No, I haven’t changed anything in the DamBreak3D.cu file. I installed first the CUDA Toolkit 11.1, then Project Chrono and finally GPUSPH. I executed “make test” and nothing else. I tried to run some other tests as well, but they all gave me similar errors.

I am using Ubuntu 18.04.5 LTS. The CUDA 10.1 version was already installed on the machine. Now I have both (CUDA 11.1 and 10.1). Could it be that they are somehow conflicting with each other? Before compiling GPUSPH, in the Makefile I changed the cuda path to the 11.1 installation folder.

Thank you for your help.

Regards

Manuel

Manuel · November 9, 2020, 8:55am

I forgot to mention that I am trying to use GPUSPH on a remote server.

Furthermore, going back to the CUDA 11.1 installation, I noticed that some of the samples do not run, e.g. nbody, Mandelbrot, etc… I get this type of error:

CUDA error at bodysystemcuda_impl.h:186 code=999(cudaErrorUnknown) “cudaGraphicsGLRegisterBuffer(&m_pGRes[i], m_pbo[i], cudaGraphicsMapFlagsNone)”

I found this thread on the nvidia forum:

and I am currently trying to solve this issue. However, could this be the reason why GPUSPH does not run properly?

Thanks for the help.

Regards

Manuel

Manuel · November 9, 2020, 9:10am

Another point: when I compile whatever problem, I get this warning messages:

/usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/thrust/detail/config/cpp_dialect.h:104:13: warning: Thrust requires C++14. Please pass -std=c++14 to your compiler. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.
THRUST_COMPILER_DEPRECATION(C++14, pass -std=c++14 to your compiler);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/cub/util_arch.cuh:36:0,
from /usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/thrust/system/cuda/detail/util.h:32,
from /usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/thrust/system/cuda/detail/for_each.h:34,
from /usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/thrust/system/detail/adl/for_each.h:42,
from /usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/thrust/detail/for_each.inl:27,
from /usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/thrust/for_each.h:279,
from /usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/thrust/system/detail/generic/transform.inl:19,
from /usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/thrust/system/detail/generic/transform.h:105,
from /usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/thrust/detail/transform.inl:25,
from /usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/thrust/transform.h:724,
from /usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/thrust/system/detail/generic/reduce_by_key.inl:29,
from /usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/thrust/system/detail/generic/reduce_by_key.h:88,
from /usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/thrust/detail/reduce.inl:26,
from /usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/thrust/reduce.h:784,
from /usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/thrust/system/detail/generic/find.inl:19,
from /usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/thrust/system/detail/generic/find.h:62,
from /usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/thrust/detail/find.inl:25,
from /usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/thrust/find.h:384,
from /usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/thrust/system/detail/generic/sort.inl:26,
from /usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/thrust/system/detail/generic/sort.h:153,
from /usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/thrust/detail/sort.inl:26,
from /usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/thrust/sort.h:1361,
from src/cuda/buildneibs.cu:36,
from src/cuda/cudasimframework.cu:46,
from src/problems/StillWater.cu:32:
/usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/cub/util_cpp_dialect.cuh:129:13: warning: CUB requires C++14. Please pass -std=c++14 to your compiler. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.
CUB_COMPILER_DEPRECATION(C++14, pass -std=c++14 to your compiler);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/cuda_runtime.h: In instantiation of ‘cudaError_t cudaBindTexture(size_t*, const texture<T, dim, readMode>&, const void*, size_t) [with T = float4; int dim = 1; cudaTextureReadMode readMode = (cudaTextureReadMode)0; cudaError_t = cudaError; size_t = long unsigned int]’:
src/cuda/forces.cu:479:31: required from ‘void CUDAForcesEngine<kerneltype, sph_formulation, densitydiffusiontype, ViscSpec, boundarytype, simflags>::bind_textures(const BufferList&, uint, RunMode) [with KernelType kerneltype = (KernelType)3; SPHFormulation sph_formulation = (SPHFormulation)1; DensityDiffusionType densitydiffusiontype = (DensityDiffusionType)3; ViscSpec = FullViscSpec<(RheologyType)1, (TurbulenceModel)0, (ComputationalViscosityType)0, (ViscousModel)0, (AverageOperator)0, 1, true>; BoundaryType boundarytype = (BoundaryType)3; long unsigned int simflags = 1; uint = unsigned int]’
/tmp/tmpxft_0000534f_00000000-6_StillWater.cudafe1.stub.c:34:27: required from here
/usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/cuda_runtime.h:1346:23: warning: ‘cudaError_t cudaBindTexture(size_t*, const texture<T, dim, readMode>&, const void*, const cudaChannelFormatDesc&, size_t) [with T = float4; int dim = 1; cudaTextureReadMode readMode = (cudaTextureReadMode)0; cudaError_t = cudaError; size_t = long unsigned int]’ is deprecated [-Wdeprecated-declarations]
return cudaBindTexture(offset, tex, devPtr, tex.channelDesc, size);
~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/cuda-11.1/bin/…/targets/x86_64-linux/include/cuda_runtime.h:1293:53: note: declared here
static __CUDA_DEPRECATED inline host cudaError_t cudaBindTexture(

Thanks.

Regards

Manuel

giuseppe.bilotta · November 10, 2020, 6:47pm

Hello @Manuel,

thanks for the bug report. Your inability to run the CUDA samples is a strong indicator that there are issues with your CUDA installation, and that should take priority in being fixed on your side. Make sure in particular that your drivers are up to date and that you are using a single version of CUDA consistently (issues may arise if you compile and link against different versions of CUDA).

That being said, I have received other reports of incompatibilities between GPU and CUDA 11.1, independently of the hardware. The issue however generally arise as unjustified illegal device memory accesses crashing most test-cases, and not the issues you are seeing. I’m currently in the process of investigating the problem, and adapting the code to the new requirements of CUDA 11. Some of the simplest fixes for this have been pushed to the next branch already (e.g. using C++14), but some others (such as the transition from traditional compile-time, static, bound textures to bindless textures) require more significant changes to the code and have not been released yet.

For the time being, I would strongly recommend using the GPUSPH next branch (if you are not using that already), and if possible downgrade CUDA to CUDA 10.x (your GPU is a V100, which is supported by CUDA 10).

Cheers,

Giuseppe Bilotta

Manuel · November 13, 2020, 9:33am

Hello Giuseppe,

thank you for your reply and help.
As you suggested, I downgraded CUDA to 10.2 and switched to the “next” branch and now everything works.

Thank you very much.

Regards

Manuel

giuseppe.bilotta · November 14, 2020, 12:32pm

Hello @Manuel, thanks for the update.

I’m glad that CUDA 10.2 is working. I’m still in the process of investigating (and hopefully fixing) the issues with CUDA 11. I’ll make an announcement when I have some positive news.

Cheers,

Giuseppe Bilotta