Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
DocumentNodeStore's revision garbage collector currently doesn't clean up 100% of garbage. Several of those gaps have so far been identified, including:
OAK-8646: "Clean up changes from orphaned branch commits"OAK-10193: "Garbage collect deleted properties"
The common aspect of the above is the fact that cleaning up that garbage on an existing repository will mean to do a full scan of the entire repository, to find and delete such garbage.
The current working title for this is "detail gc"
The ticket here is about creating a skeleton of a garbage collector that the above, individual garbage types can then "hook into".
There are two parts of the cleanup:
- an initial, full repository scan
- an iterative, continuous scan (eg after the above full scan has completed)
The full repository scan is optional - one could decide to leave the garbage and not worry about it (but enable the continuous scan and thus clean up documents that are changed in the future lazily).
While the two parts could in theory be based on a different query, it can also be done on the same query.
One suggested query is to go through all documents where "_modified" is between the previous gc run and an increment, but older than the 'versionGcMaxAgeInSecs' (24h by default) - plus eg taking checkpoints into account.
A full repository scan is then characterized by setting this "previous gc run" pointer to zero.
In particular for the full repository scan it is necessary for the gc to run in reasonably small batches - and apply a voluntary throttle, to avoid system overload.
Attachments
Issue Links
- blocks
-
OAK-8646 Clean up changes from orphaned branch commits
- Resolved
-
OAK-10193 Garbage collect deleted properties
- Resolved
-
OAK-10535 Clean up old revisions in a document
- Resolved
- is related to
-
OAK-10378 Add metrics for detailed GC
- Resolved
-
OAK-10370 Dry-run mode for full GC
- Resolved
-
OAK-10689 Extend oak-run revisions command with "detail" garbage collection
- Resolved
- relates to
-
OAK-10676 Consider late-writes while removing deleted properties during detailedGC
- Resolved
-
OAK-10597 embedded verification for detailedGC
- Resolved
- links to