Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7570

Tragic events during merges can lead to deadlock

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 5.5, master (7.0)
    • Fix Version/s: master (7.0), 5.5.4, 6.4
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      When an IndexWriter#commit() is stalled due to too many pending merges, you can get a deadlock if the currently active merge thread hits a tragic event.

      1. The thread performing the commit synchronizes on the the commitLock in commitInternal.
      2. The thread goes on to to call ConcurrentMergeScheduler#doStall() which waits() on the ConcurrentMergeScheduler object. This release the merge scheduler's monitor lock, but not the commitLock in IndexWriter.
      3. Sometime after this wait begins, the merge thread gets a tragic exception can calls IndexWriter#tragicEvent() which in turn calls IndexWriter#rollbackInternal().
      4. The IndexWriter#rollbackInternal() synchronizes on the commitLock which is still held by the committing thread from (1) above which is waiting on the merge(s) to complete. Hence, deadlock.

      We hit this bug with Lucene 5.5, but I looked at the code in the master branch and it looks like the deadlock still exists there as well.

      1. LUCENE-7570.patch
        6 kB
        Michael McCandless
      2. thread_dump.txt
        11 kB
        Martin Amirault

        Activity

        Hide
        marumarutan Martin Amirault added a comment -

        Reproduced in production with Lucene 6.1
        Attaching extract from thread dump when it reproduced

        Show
        marumarutan Martin Amirault added a comment - Reproduced in production with Lucene 6.1 Attaching extract from thread dump when it reproduced
        Hide
        mikemccand Michael McCandless added a comment -

        Thanks for reporting this Martin Amirault, I'll have a look.

        Show
        mikemccand Michael McCandless added a comment - Thanks for reporting this Martin Amirault , I'll have a look.
        Hide
        mikemccand Michael McCandless added a comment -

        And thank you Joey Echeverria!

        Show
        mikemccand Michael McCandless added a comment - And thank you Joey Echeverria !
        Hide
        mikemccand Michael McCandless added a comment -

        Here's a patch w/ test case reproducing the deadlock, and a simple fix, that just postpones launching merges until after we are out of the commit lock.

        Show
        mikemccand Michael McCandless added a comment - Here's a patch w/ test case reproducing the deadlock, and a simple fix, that just postpones launching merges until after we are out of the commit lock.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 2b073a2f296289617bea8256d7efec06049df739 in lucene-solr's branch refs/heads/master from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2b073a2 ]

        LUCENE-7570: don't run merges while holding the commitLock to prevent deadlock when merges are stalled and a tragic merge exception strikes

        Show
        jira-bot ASF subversion and git services added a comment - Commit 2b073a2f296289617bea8256d7efec06049df739 in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2b073a2 ] LUCENE-7570 : don't run merges while holding the commitLock to prevent deadlock when merges are stalled and a tragic merge exception strikes
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit ea3f8363319955c589eb3a7df59a031621852d3e in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ea3f836 ]

        LUCENE-7570: don't run merges while holding the commitLock to prevent deadlock when merges are stalled and a tragic merge exception strikes

        Show
        jira-bot ASF subversion and git services added a comment - Commit ea3f8363319955c589eb3a7df59a031621852d3e in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ea3f836 ] LUCENE-7570 : don't run merges while holding the commitLock to prevent deadlock when merges are stalled and a tragic merge exception strikes
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 2b073a2f296289617bea8256d7efec06049df739 in lucene-solr's branch refs/heads/feature/metrics from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2b073a2 ]

        LUCENE-7570: don't run merges while holding the commitLock to prevent deadlock when merges are stalled and a tragic merge exception strikes

        Show
        jira-bot ASF subversion and git services added a comment - Commit 2b073a2f296289617bea8256d7efec06049df739 in lucene-solr's branch refs/heads/feature/metrics from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2b073a2 ] LUCENE-7570 : don't run merges while holding the commitLock to prevent deadlock when merges are stalled and a tragic merge exception strikes
        Hide
        mikemccand Michael McCandless added a comment -

        Reopen for back port to 5.5.4.

        Show
        mikemccand Michael McCandless added a comment - Reopen for back port to 5.5.4.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 7a9b568bda29b74333bfb74c7420b4511562253f in lucene-solr's branch refs/heads/branch_5_5 from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7a9b568 ]

        LUCENE-7570: fix IndexWriter deadlock when a tragic merge exception is hit while too many merges are running

        Show
        jira-bot ASF subversion and git services added a comment - Commit 7a9b568bda29b74333bfb74c7420b4511562253f in lucene-solr's branch refs/heads/branch_5_5 from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7a9b568 ] LUCENE-7570 : fix IndexWriter deadlock when a tragic merge exception is hit while too many merges are running

          People

          • Assignee:
            mikemccand Michael McCandless
            Reporter:
            fwiffo Joey Echeverria
          • Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development