Synchronization between workers and hosts

JoJo · January 10, 2021, 9:26am

Hi.
maybe, some guys know i am trying to implement adaptive particle refinement (APR) base on GPUSPH, almost complete.

this is a simple test. a Cube(red) free drop with initial vel z=-2m/s. and a refinement area surrounding cube. figure below shows the deviceindex.

seems not bad, and this refinement area moves with the cube.

But, i found my code is very delicate, crash sometimes happens and sometimes not, and non scheduled (iterations), makes it very difficult to debug.

i suspect this is due to synchronization. i want to ask if each thread (host) would be synchronized at the end of each command (barrier()), moreover, when a command needs to execute cudaMemcpy and cudaMemcpyToSymbol, then worker and host would be synchronized at the end of this command.

giuseppe.bilotta · January 14, 2021, 5:13pm

Hello @JoJo,

yes, command dispatch in GPUSPH is synchronous: every doCommand() is dispatched at the same time to all (GPU)Worker thread, and the threads themselves only fetch the next command when the previous one is completed. The only exception to this is in the multi-GPU case when using striping, in which case there is an asynchronous forces kernel call (it’s the difference between FORCES_SYNC, which is synchronous, and FORCES_ENQUEUE, the async version).

Note that even with the async version all workers still dispatch commands from the manager at the same time: the only difference is that the next doCommand() will be executed while the kernel is running, instead of waiting for it to complete.

For the code you introduced, you can ensure that kernel execution has completed before the end of the doCommand() execution by placing a KERNEL_CHECK_ERROR after it, and for other CUDA API calls you can wrap them in CUDA_SAFE_CALL(...): this makes them synchronous and verifies for error conditions at the place of call. This should also help ensure that if an error arises, it will also be caught at the same place (even if possibly by different threads).

Can you be a bit more specific about the crash you’re experiencing? It might be due to something else too (e.g. uninitialized or unallocated memory being accessed during an UPDATE_EXTERNAL).

JoJo · February 3, 2021, 4:22am

Thanks for reply.
now, i have located the bug. and i’m wrong, it has no bussiness with the synchronize.

i write a new kernel, in this kernel, i use the functon below
p_precalc_particle_data(*this)

however, it returns the fluid_id = 4.
indeed, i’m running a two-phase model, so it can only be 0 or 1.
when this happen, obviously, illegal access.
Moreover, it “sometimes” happen, not always.
i suspect it may be caused by this pointer, and i have fixed it by rewrite the function without the using of this pointer.

anyway, thanks for reply.

giuseppe.bilotta · February 3, 2021, 10:33am

Hello,

the construction of p_precalc_particle_data with the this pointer should work, as long as the needed members of the structures have already been initialized. In particular, the structure should already have its info and vel or relVel members correctly initialized (this is why forces_particle_data derives from and constructs p_precalc_particle_data after common_particle_data and vel_particle_data). The order of initialization is important, because otherwise the function will access members of the structure that have not been initialized yet.