Lucene - Core
  1. Lucene - Core
  2. LUCENE-6381

DocumentsWriterStallControl's .wait() should have a time limit

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This build was hung: http://build-us-00.elastic.co/job/es_core_15_centos/230/testReport/junit/org.elasticsearch.index.engine/InternalEngineTests/testDeletesAloneCanTriggerRefresh/

      Only one thread was stalled in DocumentsWriterStallControl, which means we have a bug somewhere, because that thread should have un-stalled once the other (too many) threads finished flushing their segments.

      I think we should make a simple defensive change here: instead of wait(), which waits forever for a .notify/All() to wake it up, we should wait for up to a time limit. This way when any concurrency bug like this strikes, we won't hang forever.

      I cannot reproduce that particular hang... what's unique about that test is it uses a positively minuscule (1 KB) IW buffer.

      1. LUCENE-6381.patch
        2 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Simple patch, one line change. I'd like to backport to 5.1... outright hangs are bad.

        This is just a defensive step ... separately, we have some concurrency bug where a .notify/All() was not sent.

        Show
        Michael McCandless added a comment - Simple patch, one line change. I'd like to backport to 5.1... outright hangs are bad. This is just a defensive step ... separately, we have some concurrency bug where a .notify/All() was not sent.
        Hide
        ASF subversion and git services added a comment -

        Commit 1670585 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1670585 ]

        LUCENE-6381: defensively wait for a limited time during DWPT stall

        Show
        ASF subversion and git services added a comment - Commit 1670585 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1670585 ] LUCENE-6381 : defensively wait for a limited time during DWPT stall
        Hide
        ASF subversion and git services added a comment -

        Commit 1670587 from Michael McCandless in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1670587 ]

        LUCENE-6381: defensively wait for a limited time during DWPT stall

        Show
        ASF subversion and git services added a comment - Commit 1670587 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1670587 ] LUCENE-6381 : defensively wait for a limited time during DWPT stall
        Hide
        ASF subversion and git services added a comment -

        Commit 1670589 from Michael McCandless in branch 'dev/branches/lucene_solr_5_1'
        [ https://svn.apache.org/r1670589 ]

        LUCENE-6381: defensively wait for a limited time during DWPT stall

        Show
        ASF subversion and git services added a comment - Commit 1670589 from Michael McCandless in branch 'dev/branches/lucene_solr_5_1' [ https://svn.apache.org/r1670589 ] LUCENE-6381 : defensively wait for a limited time during DWPT stall
        Hide
        Timothy Potter added a comment -

        Bulk close after 5.1 release

        Show
        Timothy Potter added a comment - Bulk close after 5.1 release

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development