Currently if in a running system there are some orphaned checkpoint present then they prevent the revision gc (compaction for segment) from being effective.
So far the practice has been to use oak-run checkpoints rm-unreferenced command to clean them up manually. This was set to manual as it was not possible to determine whether current checkpoint is in use or not. rm-unreferenced works with the basis that checkpoints are only made from AsyncIndexUpdate and hence can check if the checkpoint is in use by cross checking with :async state. Doing it in auto mode is risky as checkpoint api can be used by any module.
OAK-2314 we also record some metadata like creator and name. This can be used for auto cleanup. For example in some running system following checkpoints are listed
As can be seen that last 2 checkpoints are orphan and they would prevent revision gc. For auto mode we can use following heuristic
- List all current checkpoints
- Only keep the latest checkpoint for given creator and name combo. Other entries from same pair which are older i.e. creation time can be consider orphan and deleted
This logic can be implemented org.apache.jackrabbit.oak.checkpoint.Checkpoints and can be invoked by Revision GC logic (both in DocumentNodeStore and SegmentNodeStore) to determine the base revision to keep