Description
Currently if in a running system there are some orphaned checkpoint present then they prevent the revision gc (compaction for segment) from being effective.
So far the practice has been to use oak-run checkpoints rm-unreferenced command to clean them up manually. This was set to manual as it was not possible to determine whether current checkpoint is in use or not. rm-unreferenced works with the basis that checkpoints are only made from AsyncIndexUpdate and hence can check if the checkpoint is in use by cross checking with :async state. Doing it in auto mode is risky as checkpoint api can be used by any module.
With OAK-2314 we also record some metadata like creator and name. This can be used for auto cleanup. For example in some running system following checkpoints are listed
Mon Sep 19 18:02:09 EDT 2016 Sun Jun 16 18:02:09 EDT 2019 r15744787d0a-1-1 creator=AsyncIndexUpdate name=fulltext-async thread=sling-default-4070-Registered Service.653 Mon Sep 19 18:02:09 EDT 2016 Sun Jun 16 18:02:09 EDT 2019 r15744787d0a-0-1 creator=AsyncIndexUpdate name=async thread=sling-default-4072-Registered Service.656 Fri Aug 19 18:57:33 EDT 2016 Thu May 16 18:57:33 EDT 2019 r156a50612e1-1-1 creator=AsyncIndexUpdate name=async thread=sling-default-10-Registered Service.654 Wed Aug 10 12:13:20 EDT 2016 Tue May 07 12:25:52 EDT 2019 r156753ac38d-0-1 creator=AsyncIndexUpdate name=async thread=sling-default-6041-Registered Service.1966
As can be seen that last 2 checkpoints are orphan and they would prevent revision gc. For auto mode we can use following heuristic
- List all current checkpoints
- Only keep the latest checkpoint for given creator and name combo. Other entries from same pair which are older i.e. creation time can be consider orphan and deleted
This logic can be implemented org.apache.jackrabbit.oak.checkpoint.Checkpoints and can be invoked by Revision GC logic (both in DocumentNodeStore and SegmentNodeStore) to determine the base revision to keep