Failed when open the vtp file

JoJo · August 23, 2021, 10:24am

Hi, maybe you know that I am doing some development basing on GPUSPH.

I’ve had a big problem recently. I can not open the output file .vtp using paraview.

I am no idea what is going on.
I check the vtk_header. (only output the Position.)

I think nothing is wrong.
Has anyone encountered this before?
I believe VTKWRITER has no relation with my modifications.

JoJo · August 23, 2021, 10:32am

If I have some code to read the vtp files directly. maybe I can find where the bug is.

Regards

giuseppe.bilotta · August 23, 2021, 10:47am

Hello @JoJo,

it seems that the .vtp file got corrupted somehow. The error message from ParaView (“the data array in the element may be too short”) indicates that the reader failed to find all the data. Since we write the data in “appended” format (basically: first the XML header with all the array data, followed by a raw dump of the actual array data), this can only happen when the appended data could not be fully written for some reason.

The first step is to check that you have indeed enough data in the file. Since you have 57 million particles, to store the position (3 components, 8 bytes each) plus the connectivity and offsets (2 arrays of 4 bytes each) you should have 32 (38+24) bytes per particle stored on disk, or around 2GB if my calculations are correct. If the file is smaller than that, then it’s definitely corrupt.

The vtk header you are showing is strange: there is not actual pointdata in the array. Did you comment out the parts in VTKWriter that append the various arrays? If so you need to make sure that the line:

	appender.write_appended_data();

remains uncommented, otherwise, you only get the header information in the file, and not the actual data.

(Another thing that comes to mind is that maybe you’re storing the files on a FAT32 filesystem, but with that the file size limit is 4GB, so a 2GB file should still work.)

JoJo · August 23, 2021, 11:51am

Thanks for the reply.

I only write the position so it is normal no pointdata. I comment it because I want to make sure this problem has no relation with the data define by my own.

and I also check the file size, which is 1.71GB.(1802213kB)
32*57670771 = 1802211kB, it seems quite normal.

JoJo · August 23, 2021, 11:56am

And another thing is quite strange.

My totParticles is 57670771, and if I set numParts = 44,737,500. Then the data file is readable, while it is unreadable when I set numParts = 44,750,000.

Then, I set numParts = 2500 and node_offset = 44,737,500 to write these particles only. The file is also readable.

JoJo · August 25, 2021, 6:27am

Hello, I am trying to fix this problem these days.
I suspect is there anything wrong with my modifications. Unfortunately, I cannot find anything worng.
Today, I test it on the original GPUSPH-5.0, both master and next. And I found that the outputs are also unreadable.

I attach my test case here, and the settings of my modifications are commented out. It can be

If you decide to test it, I advise that we can write the data directly just after copy_to_array, in this case the device memory is out of concern. Also, we only need to write the Position to reduce the file size and the runtime.

the following is the information I have about this problem.

the example cases are good, like Bubble with m_deltap = 0.128R/7 totParticles about 630,30,000.
If I set gdata->processParticles[0] = 44739242 in gdb, the output is readable, while it is unreadable gdata->processParticles[0] = 44739243.
However, particles index = 4473924* is good, can be readable write if I set gdata->processParticles[0] = 10000000 and node_offset = 44739242.

It is so wired. As it can be read normally when Bubble with such a totParticles, means there is no limit with particle numbers or file size.
However, the output of the testcase I give is unreadable.
the vtk_header I showed before and the file size indicates that the data may be successfully written.

Can anyone help me to fix it?

JoJo · August 25, 2021, 6:28am

I have created a issue on github and the testcase have been uploaded.

giuseppe.bilotta · August 25, 2021, 12:32pm

Hello @JoJo,

I’ve seen the GH issue and test case. I’ll try to reproduce to see what’s wrong.

JoJo · August 27, 2021, 11:13am

OK. Maybe I know what is going on… @giuseppe.bilotta
In the function append_local_data in VTKWriter.cc.

If I am right. these numbytes means the total size of the data array. For example, 3 * 8 * numParts for Position. This value equals to the offset of the next data array, or 4 bytes smaller than that.

However, sizeof(T) equals to 32. Then why we need to times three?
This may shows why 44739242 is good, while 44739243 is unreadable.
I attach the debug info here.
for numParts = 44739242

though I think numbytes is wrong. But it still work. And note that, numbytes is very close to UINT32_MAX.

for numParts = 44739243

Obviously, An uint32 cannot represent these number.!!

I will do some tests tomorrow.
I believe numbytes should be
numbytes = sizeof(T)*numParts; or 3 * sizeof(S) *numParts;
If we have to times three, tell me, please.
Another thing, can we change its type to size_t, which would be safer.

giuseppe.bilotta · August 27, 2021, 12:01pm

Hello @JoJo, good spotting. The issue is indeed that the vector type rather than the component type was being used for the size computation. I’m guessing that when saving all data arrays this went unnoticed because the size was still large enough to allow have data (from other arrays) to hide the bug. I’ve pushed a commit to next that should fix the issue.

JoJo · August 27, 2021, 12:28pm

Yeah, I do understand.
Actually, I am new in SPH(maybe this is my second year). This problem should not have been discovered by me. I mean a large scale problem should’ve been simulated in the past. I’m not blaming others. Just read the SPH paper, dambreak, water entry of a 2d wedge, 2d bubble raising…
All small cases. Indeed, one can simulate these cases to publish paper, or propose a new model.
But, one day we need to deal with the large scale problem…
Also I believe we can do more basing on a powerful code.

That is why I choose GPUSPH to develop my model, because of its architecture designed for large-scale computing

giuseppe.bilotta · August 27, 2021, 1:48pm

There are several reasons why this particular bug was discovered by you. First of all, the new code for writing VTK files (where the bug was introduced) is relatively recently (June 2018). Moreover, for very large simulations so far we’ve usually gone through multi-node: single GPUs capable of hosting 50M particles are quite recent. And finally, as you’ve found out while trying to make a reproducible test case, the actual number of particles that cause the issue is within a specific range, so it’s possible we just missed it until now! But it’s good that you’ve found it and that we could fix it

I agree with you that the fact that most model test cases are relatively small (and 2D) has significant limitations, both because 3D extensions of many formulations are often less trivial than one could think, and because in many practical applications one ends up running much larger problems, and not all codes are well-tested at those sizes. I’m glad you chose GPUSPH for this