Lucene - Core
  1. Lucene - Core
  2. LUCENE-5006

Simplify / understand IndexWriter/DocumentsWriter synchronization

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      The concurrency in IW/DW/BD is terrifying: there are many locks involved, not just intrinsic locks but IW also has fullFlushLock, commitLock, and there are no clear rules about lock order to avoid deadlocks like LUCENE-5002.

      We have to somehow simplify this, and define the allowed concurrent behavior eg when an app calls deleteAll while other threads are indexing.

      1. LUCENE-5006.patch
        79 kB
        Simon Willnauer
      2. LUCENE-5006.patch
        78 kB
        Simon Willnauer
      3. LUCENE-5006.patch
        62 kB
        Simon Willnauer

        Activity

        Hide
        Simon Willnauer added a comment -

        here is a first/rough cut with some nocommits but it shows the idea. This patch removes all dependencies to DocumentsWriter and IndexWriter from DWPT. I factored everything out and made DWPT basically a non-reuseable class. It initializes itself entirely in the ctor including the segment name etc.

        At the same time DWPTThreadPool doesn't initialize DWPT anymore it only handles ThreadStates and pooling and the actual DWPT creation is done on the ThreadPools consumer level. This makes the pool much simpler as well.

        DocumentsWriter doesn't communicate with the IW directly. It only creates certain events that will be processed by the IW once the DW action is done. This works since all the Events are kind of atomic operations which can be executed once we exit DW. For instance if we flush a segment the DW doesn't push it up to the IW but once DW returns IW proceses the event an pulls the new flushed segment in. The same is true for pending merges etc.

        This simplifies the locking a lot here since we basically can't deadlock anymore from DW or DWPT since we don't even have a reference to IW anymore. The only remaining thing is when we create a new DWPT we need to call into IW to get a new seg. name but it's a start.

        Show
        Simon Willnauer added a comment - here is a first/rough cut with some nocommits but it shows the idea. This patch removes all dependencies to DocumentsWriter and IndexWriter from DWPT. I factored everything out and made DWPT basically a non-reuseable class. It initializes itself entirely in the ctor including the segment name etc. At the same time DWPTThreadPool doesn't initialize DWPT anymore it only handles ThreadStates and pooling and the actual DWPT creation is done on the ThreadPools consumer level. This makes the pool much simpler as well. DocumentsWriter doesn't communicate with the IW directly. It only creates certain events that will be processed by the IW once the DW action is done. This works since all the Events are kind of atomic operations which can be executed once we exit DW. For instance if we flush a segment the DW doesn't push it up to the IW but once DW returns IW proceses the event an pulls the new flushed segment in. The same is true for pending merges etc. This simplifies the locking a lot here since we basically can't deadlock anymore from DW or DWPT since we don't even have a reference to IW anymore. The only remaining thing is when we create a new DWPT we need to call into IW to get a new seg. name but it's a start.
        Hide
        Michael McCandless added a comment -

        Thanks Simon.

        It looks like most of the changes are 1) the new Events system for DW
        to communicate back to IW when something needs to happen (flush,
        applyDeletes, etc.) so that DW never needs to call back into a sync'd
        IW method, and 2) passing IW as an argument to various methods instead
        of holding it as a member in DW? (Well and also adding IW as a member
        in DocumentsWriterFlushControl).

        Why does applyAllDeletes need a new boolean forced param?

        Looks like BytesRefHash has accidental new int[] hashes?

        What is flagForcePurge for? Can it somehow be passed as a param to
        the ApplyDeletesEvent?

        Maybe rename "boolean iwCheck" to "boolean iwHasEvents" in
        DW.postUpdate? This returned boolean means new Events were added for
        IW to process right?

        There are some parts I don't quite understand, e.g. the changing logic
        for when initializeThreadState or deactivateThreadState or
        applyAllDeletes are invoked...

        finishFlush has too much indentation.

        Show
        Michael McCandless added a comment - Thanks Simon. It looks like most of the changes are 1) the new Events system for DW to communicate back to IW when something needs to happen (flush, applyDeletes, etc.) so that DW never needs to call back into a sync'd IW method, and 2) passing IW as an argument to various methods instead of holding it as a member in DW? (Well and also adding IW as a member in DocumentsWriterFlushControl). Why does applyAllDeletes need a new boolean forced param? Looks like BytesRefHash has accidental new int[] hashes? What is flagForcePurge for? Can it somehow be passed as a param to the ApplyDeletesEvent? Maybe rename "boolean iwCheck" to "boolean iwHasEvents" in DW.postUpdate? This returned boolean means new Events were added for IW to process right? There are some parts I don't quite understand, e.g. the changing logic for when initializeThreadState or deactivateThreadState or applyAllDeletes are invoked... finishFlush has too much indentation.
        Hide
        Simon Willnauer added a comment -

        Here is a cleaned-up version of the patch.

        I removed the accidentally added (leftover) int[] from BytesRefHash that was indeed unintended.

        I also removed all the leftovers like forcePurge and applyDeletes flags they were still in there from a previous iteration without the Queue. I changed maybeMerge to hasEvents consistently.

        The changes in DWPT and DWPTThreadPool are mainly due to the fact that I move the creation of DWPT into DW and out of the ThreadPool. The ThreadPool only maintains the ThreadState instances but is not responsible for creating the actual DWPT. DWPT is now not "reuseable" anymore, yet we never really reused them but if they were initialized and we did a full flush we kept using them with a new DeleteQueue which is gone now. This is nice since DWPT is now solely initialized in its Ctor. This includes the segment name which we obtain from IW when the DWPT is created. This remains the only place where we sync on IW which is done in updateDocument right now.

        I think this patch is a step into the right direction making this simpler, at the end of the day I'd want to change the lifetime of a DW to be a single flush and replace the entire DW once we flush or reopen. This would make a lot of logic much simpler but I don't want to make this big change at once so maybe we should work to get the current patch into trunk and let it bake in a bit.

        Show
        Simon Willnauer added a comment - Here is a cleaned-up version of the patch. I removed the accidentally added (leftover) int[] from BytesRefHash that was indeed unintended. I also removed all the leftovers like forcePurge and applyDeletes flags they were still in there from a previous iteration without the Queue. I changed maybeMerge to hasEvents consistently. The changes in DWPT and DWPTThreadPool are mainly due to the fact that I move the creation of DWPT into DW and out of the ThreadPool. The ThreadPool only maintains the ThreadState instances but is not responsible for creating the actual DWPT. DWPT is now not "reuseable" anymore, yet we never really reused them but if they were initialized and we did a full flush we kept using them with a new DeleteQueue which is gone now. This is nice since DWPT is now solely initialized in its Ctor. This includes the segment name which we obtain from IW when the DWPT is created. This remains the only place where we sync on IW which is done in updateDocument right now. I think this patch is a step into the right direction making this simpler, at the end of the day I'd want to change the lifetime of a DW to be a single flush and replace the entire DW once we flush or reopen. This would make a lot of logic much simpler but I don't want to make this big change at once so maybe we should work to get the current patch into trunk and let it bake in a bit.
        Hide
        Michael McCandless added a comment -

        +1, thanks Simon!

        Show
        Michael McCandless added a comment - +1, thanks Simon!
        Hide
        Simon Willnauer added a comment -

        mike since I am going on vacation and I am not very keen to watch our builds failing with sneaky concurrency problems I created a branch CI build from my github repo to stress this a bit while I am on vacation... http://builds.flonkings.com/job/DWPT_Refactoring/ one I come back I can get this in... I think this makes most sense.

        Show
        Simon Willnauer added a comment - mike since I am going on vacation and I am not very keen to watch our builds failing with sneaky concurrency problems I created a branch CI build from my github repo to stress this a bit while I am on vacation... http://builds.flonkings.com/job/DWPT_Refactoring/ one I come back I can get this in... I think this makes most sense.
        Hide
        Simon Willnauer added a comment -

        Here is an updated version of the patch that was running CI builds for over a month without a failure here I think we are ready to commit this here.

        Show
        Simon Willnauer added a comment - Here is an updated version of the patch that was running CI builds for over a month without a failure here I think we are ready to commit this here.
        Hide
        ASF subversion and git services added a comment -

        Commit 1515459 from Simon Willnauer in branch 'dev/trunk'
        [ https://svn.apache.org/r1515459 ]

        LUCENE-5006: Simplified DW and DWPT synchronization and concurrent interaction with IW

        Show
        ASF subversion and git services added a comment - Commit 1515459 from Simon Willnauer in branch 'dev/trunk' [ https://svn.apache.org/r1515459 ] LUCENE-5006 : Simplified DW and DWPT synchronization and concurrent interaction with IW
        Hide
        ASF subversion and git services added a comment -

        Commit 1515463 from Simon Willnauer in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1515463 ]

        LUCENE-5006: Simplified DW and DWPT synchronization and concurrent interaction with IW

        Show
        ASF subversion and git services added a comment - Commit 1515463 from Simon Willnauer in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1515463 ] LUCENE-5006 : Simplified DW and DWPT synchronization and concurrent interaction with IW

          People

          • Assignee:
            Simon Willnauer
            Reporter:
            Michael McCandless
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development