A little suggestion about writers and output file

JoJo · October 5, 2021, 2:12am

The most important writers are HOTWRITER and VTKWRITER.
I have noticed that these two writers execute every save-freq. However, only the last N(setting by users) checkpoints will be kept, others will be deleted.
I think there has some drawback
One is that have a lot of duplication in the two files(.bin and .vtk).
The other is that the VTK file is so large, I run a simulation with 0.1 billion particles, causing a 11G VTK file.

Here is my suggestion

During the simulation, only active HOTWRITER, without checkpoints.
A format translater is required, to convert the .bin to .vtk.

In this way, the simulation should be accelerated slightly, because the post-processing procedure and VTKWRITER are disabled here, and the storage requirements can be reduced.
the post-processing can be executed in that translater.

giuseppe.bilotta · October 5, 2021, 5:47am

Hello @JoJo,

yes, this is something we’ve been thinking about too. There are a few problems with the approach, such as the fact that HotFiles by themselves to not have enough information to reconstruct everything that gets saved by the VTKWriter, which is part of the reason why the .vtk file are so large.

The idea of having some kind of converter, possibly based on the GPUSPH code itself, that did the restore of the HotFile, the post-processing needed by the VTK writer, and saved the result is on our wishlist. Another alternative to reduce disk space usage would be to write out compressed VTK files. This could save quite a bit of space too, and should also be easier to implement.

(And of course, the duplication can be minimized by saving only the last 1 or 2 checkpoints.)

We’ve been fighting against the huge disk space usage for a long time, and there has been a lot of progress since (imagine: we were using the ASCII VTK formats before, which are even larger). However, it’s still a relatively low priority issue compared to other more work-related changes, so nobody has really gotten to work on these items yet (also, async writing so that simulation can go on while the data is written to disk would be nice to have).

Ideally, these would be tackled by someone from a computer science or similar course. They make good material for a master project or, within a wider framework, as part of a PhD (similar to our multi-GPU/multi-node implementation, that was developed by a PhD and post-doc in computer science). We’re always looking for young researchers that might be interested, but without much luck so far 8-/