A simple demo of coupling GPUSPH with adaptive particle refinemnt, but it seems to be strugling using exist verlet list

JoJo · September 5, 2020, 7:54am

Hi, forums

basically, i impletement adaptive particle refinement base on GPUSPH (basicall SPH fomula, MULTI-GPU and MULTI-NODE). 000
initial state, original example Bubble
111
about 5000 itterates

But, my code is very delicate, and mainly blame to verlet list.
my code is base on this article

according to this article, splited inactive particles have to project to its located active layer to find neibs, we can done it by localpos->globalpos->localpos(target layer). but the trouble is the neiblist we build is actually a relative offset (gridOffset) adding the offset from Cellstart, which means particleHash would never change till next BUILDNEIBS, however for splited inactive particles under APR framework, this requirement is not satisfied (once this happen, maybe illegal address), at least we store the data particleHash(project) , or, we store the absolute offset for each neibs, but clearly it requirement more memory usage (ushort->uint).

We can estimate the memory usage, for physical array (vel, pos, and so on), we only need no more than 300 bytes per particle, considering the multi-step scheme, triple it, 900bytes. Then, for neibslist, in 3D conditions, if we assumed 1 particle have 150 neibs, than 300 bytes for a single particle another words, almost 1/3 memory is unusable… then, a single GPU card can only simulate about 50 million particles even you use Tesla V100

Then, APR. if we split a single cell for 7 times, at the end it can generate more than 27*8^7 = 56,623,104…

and sure, not all have interest on such a large simulation (even me). But, that’s live…
so, i am trying to implement cell linked list.
at the end, i have a question

In particleSystem, all buffers have no state are in pool, is that right?

giuseppe.bilotta · September 5, 2020, 11:09pm

Hello @JoJo,

good to hear about your progress. I gave the paper a quick read, and if I understand correctly the authors also acknowledge that the growing number of neighbors is an issue with the Barcarolo approach.

Concerning the conversion of the local pos from one layer to the other, I think it should be possible to convert “directly”, or at least in a way that tries to preserve the accuracy of the local position, by computing appropriate integer offsets between the cell indices in the two layers first (I haven’t seen your code yet, so I don’t know how you’re doing it now).

Concerning your worry about the particleHash not being updated, keep in mind that it’s possible to force a particle rebuild in specific moments of the simulation if needed (for example, open boundaries force a rebuild whenever new particles are created). In your case, forcing a NL rebuild after a split is probably necessary.

The issue of the relative offset changing based on the target layer (if I understand the issue correctly) could be solved by storing the neighbors for different layers separately: either on a separate NL, or by using a mechanism similar to what we use for different particle types (so e.g. you could store FLUID neibs of coarse layer->, <- BOUNDARY neibs of coarse layer | FLUID neibs of fine layer -> , <- BOUNDARY neibs of fine layer).

Concerning the buffers for the particle system, all the buffers that are added on initialization (see e.g. GPUWorker’s constructor) are in the pool until they are moved to a specific state. You can trace the changes by using the relevant debug flags (e.g. ./GPUSPH --debug print_step,inspect_buffer_lists will print each executed step, and how the buffer lists change because of it).

Hope this helps.
.

JoJo · September 6, 2020, 1:44am

Yes, you memtioned about the explosive growth of neibs, absolutely a trouble. And, this is why i use multi-grid. Because acccroding to the paper, neibs that of insterest are almost unchanged. But, now the question are, (1). multi-grid means multi-hash and localpos, then should i store every hash and localpos per particle? i think that a bad idea. How about the posoffset between cells on different layer, i also think it is not a good choise, because this may generate a large array according to your refinement layer. But, the globalpos won’t change, so gridpos+localpos->globalpos, then i can use this to calculate my gridpos and localpos(project layer). due to VL, gridpos won’t change till next BUILDNEIBS, only localpos are updated each sub-step. after EULER, if this localpos larger than cellsize (x,y,or z, anyone). then the precess globalpos->gridpos and localpos(project layer) would be a big trouble, because this gridpos may different from the gridpos during BUILDNEIBS. As a result, maybe illegal address. (Moreover, base on VL, at least in a single step, BUILDNEIBS wouldn’t execute, otherwise, there is no advantage over CLL).(2). you memtioned about the accuracy of the local position. yeah, i believe this may be a problem. i am considering how about i use double globalpos instead of hash and localpos

giuseppe.bilotta · September 8, 2020, 7:43am

An important thing to keep in mind is that in most case you don’t work directly with the hash (i.e. the encoded gridpos), but with the gridpos itself: this makes everything much easier. The only place where the hash is used directly is during the sorting phase.

It’s possible to convert between the fine gridpos + lpos and the coarse gridpos + lpos without going through the global pos. This is easier and more accurate if there are some restrictions such as power-of-two refinement and grid-aligned subdivisions, but it can be done in an arbitrary case as well. Note that if you do the conversion directly from grid + locals to grid + local, the fact that localpos is larger than the grid step won’t be an issue.

The other question: when do you actually need to do the conversion? To compute the distance between particles (i.e. during the interaction phase), you can just stick to a single grid for al particles. Even if you do everything at the coarse grid level, it’s still going to be better than using global pos (even in double precision).