Details
Description
Context
I'm using a TDB2 dataset in a long-running Scala application, in which the dataset gets compacted regularly. After compactions, the application removes the Data-xxxx folder of the previous generation. However, the corresponding disk space isn't properly returned back to the OS, but is still reported as being used by df. Indeed, lsof shows that the application keeps open file descriptors that point to the old generation's files. Only stopping / restarting the JVM frees the disk space for good.
Reproduction steps
- Connect to an existing TDB2 dataset
val dataset = TDB2Factory.connectDataset("sample")
- Check open files
open_files_before_compaction.png - Compact the dataset
DatabaseMgr.compact(dataset.asDatasetGraph)
- Check open files (before garbage collection)
open_files_after_compaction_before_gc.png - Check open files (after garbage collection)
open_files_after_compaction_after_gc.png
The last sceenshot shows that, even after garbage collection, there are still open file descriptors pointing to the old generation Data-0001.
Impact
Depending on how disk usage is being reported, this can be quite problematic. In our case, we're running on an OpenShift infrastructure with limited storage. After only a handful of compactions, the storage is considered full and cannot be used anymore.
Attachments
Attachments
Issue Links
- links to