Question about gridsize and cellsize

JoJo · July 16, 2020, 6:27am

Hi, i have some question about gridsize and cellsize.

to my knowledge, we used to subdivide the domain into lots of cell, and this set of cells is so called grid.
to cooperate with the neighboring searching, the size of cell is better to be R (R=kh, and h=sfactor* deltap).
base on these, here comes my question

according to the code, gridsize (the number of cells along each axis) = floor (worldsize / R).
why floor ranther than ceil?

as shown in above picture, i think the number of cells, in other words, gridsize is not enough.
and the testcase Bubble,

but the gridsize would be {26,26,42}
this really confuse me, but clearly nothing wrong when i run the testcase. i just think it is strange.
about cellsize, its type is double3, why double3.
because, in order to work with neighboring searching, i believe its value should be R, which is
double (cellsize.x=cellsize.y=cellsize.z). actually it almost a double. in Bubble,

and the last question,
can i use openmp in the code, i found that the code executed on host are serial, except the thread GPUWORKER. i wonder, can i use the rest thread (except the thread who control GPU) to do some work like the calculation of dynamic load balance .but, i am really not sure will openmp conflict with that thread, to be honest, i think maybe it will

giuseppe.bilotta · July 16, 2020, 4:50pm

Hello @JoJo,

to ensure that the neighbors of each particle are found at most in the cells in the Moore neighborhood of the particle cell, the cell side must be no less than the search radius.

(The search radius itself must be no less than the smoothing kernel compact support radius kh, but it can be higher, which can be tuned in GPUSPH by changing the neighbors search expansion parameter; using a larger search radius than the kernel support radius allows you to build the neighbors list less frequently without losing interactions.)

Note that the cell side can be larger than the search radius: the condition that neighbors are found in the Moore neighborhood of the cells is still satisfied. It is not satisfied anymore if the cell side is smaller than the search radius.

Now, to ensure that an integer number of cells span the world size W, you want the number C of cells to be no more than W/S. We choose C = floor(W/S) <= W/S, because this ensures that for the cell side D we have D = W/C >= W/(W/S) = S.

Consider as an example the case where W = 70 and S = 3; then W/S = 23.(3), and C = 23; then D = W/C = 70/23 = 3.0434… > 3, meaning that the cell side is no less than the search radius and the Moore neighborhood condition is satisfied. If instead we had picked the ceiling, then the number of cells would have been 24 and the cell side W/C would have been 2.91(6) < 3: smaller than the search radius, with the potential to miss some neighbors.

(Note that in general in SPH missing some neighbors far away is not a big issue, since their contributions is small; but technically any code that uses the ceiling function to choose the number of cells is wrong.)

In general the discrepancy between the cell side and the search radius is small, but in low resolutions and/or small domains this can become large-ish.

Concerning the use of OpenMP to parallelize the host code, this is actually something that we’ve been pondering ourselves. You can find some comments e.g. in doWrite about parallelizing some of the stuff there, which is run on host. We’ve never actually done it because it’s a very low priority task compared to many other things we’re working on.

For what it’s worth, I don’t think using OpenMP to parallelize some host tasks would lead to significant conflicts with the worker threads: their load is usually relatively small anyway (busy-waiting for the GPU kernels to finish). At worst I’d expect that what would happen is that in case of heavy load the next command in the chain may suffer from slightly higher latency, which is something we haven’t optimized for yet, anyway.

(For the future, please open a separate thread to ask unrelated questions, it helps keep the discussions organized ;-))

JoJo · July 17, 2020, 1:45am

Thanks for the reply.
i found this in doWrite
// TODO: parallelize? (e.g. each thread tranlsates its own particles)
yes, this will be useful. i am just confused between std::thread and the thread of OpenMP, to my knowledge OpenMP is a lightweight parallel architecture. Ideally, we want OpenMP to schedule thread
ignoring the GPUWORKER (std::thread).

Anyway, this ia also a low piority task for me. maybe i will do some simple test in future.

And, can we know what your team is working on, creating a new column on the main page, i believe all user are interested on these.
anyway, looking forward.to new feature

giuseppe.bilotta · July 17, 2020, 10:59am

Internally, OpenMP still uses threads exactly like pthread in C and std::thread in C++; the only difference is that (in most cases, and unless the developer chooses otherwise), the number of threads and the distribution of work across the threads is automatically managed. Other than that, OpenMP’s parallelization is not intrinsically more lightweight than using the lower-level classes. Still, if one wants to avoid conflicts between OpenMP and Worker threads, a relatively simple solution would be use omp_set_num_threads to a value computed from the maximum threads and the number of devices.

The idea to add some information about tasks currently being worked on is interesting, we’ll consider it for the upcoming redesign of the website.