A race condition exists between the scheduled blob ID publishing process and the GC process that can resurrect the blobs being deleted by the GC. This is how it can happen:
- MarkSweepGarbageCollector.collectGarbage() starts running.
- As part of the preparation for sweeping, BlobIdTracker.globalMerge() is called, which merges all blob ID records from the blob store into the local tracker.
- Sweeping begins deleting files.
- BlobIdTracker.snapshot() gets called by the scheduler. It pushes all blob ID records that were collected and merged in step 2 back into the blob store, then deletes the local copies.
- Sweeping completes and tries to remove the successfully deleted blobs from the tracker. Step 4 already deleted those records from the local files, so nothing gets removed.
The end result is that all blobs removed during the GC run will be considered still alive and causes warnings when later GC runs try to remove them again. The risk is higher the longer the sweep runs, but it can happen during a short but badly timed GC run as well. (We've found it during a GC run that took more than 11 hours to complete.)
I can see two ways to approach this:
- Suspend the execution of BlobIdTracker.snapshot() while Blob GC is in progress. This requires adding new methods to the BlobTracker interface to allow suspending and resuming snapshotting of the tracker.
- Have the two overloads of BlobIdTracker.remove() do a globalMerge() before trying to remove anything. This ensures that even if a snapshot() call happened during the GC run, all IDs are "pulled back" into the local tracker and can be removed successfully.