Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-10199

Skeleton of an additional, extendable "detail" garbage collector based on only "_modified"

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • documentmk

    Description

      DocumentNodeStore's revision garbage collector currently doesn't clean up 100% of garbage. Several of those gaps have so far been identified, including:

      • OAK-8646 : "Clean up changes from orphaned branch commits"
      • OAK-10193 : "Garbage collect deleted properties"

      The common aspect of the above is the fact that cleaning up that garbage on an existing repository will mean to do a full scan of the entire repository, to find and delete such garbage.

      The current working title for this is "detail gc"

      The ticket here is about creating a skeleton of a garbage collector that the above, individual garbage types can then "hook into".

      There are two parts of the cleanup:

      • an initial, full repository scan
      • an iterative, continuous scan (eg after the above full scan has completed)

      The full repository scan is optional - one could decide to leave the garbage and not worry about it (but enable the continuous scan and thus clean up documents that are changed in the future lazily).

      While the two parts could in theory be based on a different query, it can also be done on the same query.

      One suggested query is to go through all documents where "_modified" is between the previous gc run and an increment, but older than the 'versionGcMaxAgeInSecs' (24h by default) - plus eg taking checkpoints into account.

      A full repository scan is then characterized by setting this "previous gc run" pointer to zero.

      In particular for the full repository scan it is necessary for the gc to run in reasonably small batches - and apply a voluntary throttle, to avoid system overload.

      Attachments

        Issue Links

          Activity

            People

              daim Rishabh Daim
              stefanegli Stefan Egli
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: