To better support garbage collection for the data store, I suggest to add a new method to AbstractBundlePersistenceManager:
- Get all node ids.
- A typical application will call this method multiple times, where 'after'
- is the last row read. The maxCount parameter defines the maximum number of
- node ids returned, 0 meaning no limit. The order of the node ids is specific for the
- given persistent manager. Items that are added concurrently may not be included.
- @param after the lower limit, or null for no limit.
- @param maxCount the maximum number of node ids to return, or 0 for no limit.
- @return an iterator of all bundles.
- @throws ItemStateException if an error while loading occurs.
public abstract NodeIdIterator getAllNodeIds(NodeId after, int maxCount)
Only for the Bundle PersistenceManagers, because those persistence managers are the most important ones (in my view).
This method is then called from the garbage collection process (or from a background thread from time to time, with a low maxCount and with enough sleep time in between). After all nodes are processed, the objects in the data store that were never scanned are deleted. This mechanism is better than the current mechanism as it can be restarted: only the last visited node needs to be persisted. It is also more efficient as the persistence manager can return the data in the order it is stored (which is easy for BundleFsPersistenceManager).
What do you think, is this approach OK?