Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-7246

Improve cleanup of locally copied index files

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.10.0, 1.9.7, 1.8.12
    • Component/s: lucene
    • Labels:
      None

      Description

      This task is to re-think how should we do clean up of locally copied index files which are no longer in use.

      Current approach:

      1. index writers, while creating index files, keep list of currently-being-written files
        1. this list is cleared when a new index writer comes into play
      2. index tracker opens new index (at new revision) via observation
        1. while being opened, we also track current dir listing of the local index files
      3. during opening new index, the tracker closes the old revision of index reader
        1. during this close, local files noted above during open are purged if ( they don't show up in remote view of the index && they aren't part of currently being written list by index writer)

      This approach, at least in following timeline, would incur extra copying (and as a side-effect also open some index files directly off of remote input stream during CoWs):

      1. CoW1 creates [a, b]
      2. CoW2 starts and creates [c, d], removes [a, b] from remote
      3. CoR1 opens an index due to CoW1
        1. local-list-CoR1 = [a, b, c, d], remote-index-list=[a, b]
      4. CoW2 finishes
      5. CoW3 creates [e, f], removes [a,b] from remote
        1. CoW-currently-being-written-list=[e,f]
      6. CoR2 opens due to CoW2
        1. local-list-CoR2=[a,b,c,d,e,f], remote-index-list=[c,d]
      7. CoR1 closes
        1. deletes [c,d] as they aren't in its list of index files ([a,b]) AND aren't part of shared list ([e,f])

      Disclaimer: the timeline might be off a bit (haven't written a test yet... but the basic point is that CoR could be working with a index file set and the new files might have come in twice after CoR - thus shared list doesn't have complete information of new files written in.

      Chetan Mehrotra, can you please check the timeline above - I'd try to work on a test case in the mean time.

        Attachments

        1. OAK-7246.patch
          41 kB
          Vikas Saurabh

          Issue Links

            Activity

              People

              • Assignee:
                catholicon Vikas Saurabh
                Reporter:
                catholicon Vikas Saurabh
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: