Lucene - Core
  1. Lucene - Core
  2. LUCENE-5461

ControlledRealTimeReopenThread waitForGeneration might sleep for targetMaxStaleSec instead of targetMinStaleSec

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 4.6.1
    • Fix Version/s: 4.8, 6.0
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      If setting the tagetMinStaleSec to 0, sometimes a call to waitForGeneration will block for targetMaxStaleSec instead of immediately call maybeRefreshBlocking on the manager.

      In effect to targetMaxStaleSec cannot be set higher than acceptable blocking resolving of a specified generation.

      1. ControlledRealTimeReopenThread.patch
        2 kB
        Hans Lund
      2. LUCENE-5461.patch
        8 kB
        Michael McCandless
      3. LUCENE-5461.patch
        5 kB
        Michael McCandless
      4. TestLucene5461.java
        4 kB
        Hans Lund

        Activity

        Hide
        Hans Lund added a comment -

        junit test showing the bug

        Show
        Hans Lund added a comment - junit test showing the bug
        Hide
        Michael McCandless added a comment -

        I tweaked the test a bit, to close everything, etc., but the test seems to pass for me.

        How did you see it fail before? Did it trip that assert about waiting too long for a gen?

        Show
        Michael McCandless added a comment - I tweaked the test a bit, to close everything, etc., but the test seems to pass for me. How did you see it fail before? Did it trip that assert about waiting too long for a gen?
        Hide
        Hans Lund added a comment -

        Yes exactly:

        java.lang.AssertionError: waited too long for generation 10001
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at org.apache.lucene.search.TestLucene5461.testCRTReopen(TestLucene5461.java:94)

        I've noticed when extracting the test that changing directory implementation influenced how often the test failed: - using a RAMDirectory failed to trigger the test. I've not tried with FSDirectory.

        Show
        Hans Lund added a comment - Yes exactly: java.lang.AssertionError: waited too long for generation 10001 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.lucene.search.TestLucene5461.testCRTReopen(TestLucene5461.java:94) I've noticed when extracting the test that changing directory implementation influenced how often the test failed: - using a RAMDirectory failed to trigger the test. I've not tried with FSDirectory.
        Hide
        Hans Lund added a comment -

        In the reopener thread the haswaiting local variable is not protected by lock. I've attached a patch that solves the inconsistent locking exposed in the test and the test now completes successful in every run

        Show
        Hans Lund added a comment - In the reopener thread the haswaiting local variable is not protected by lock. I've attached a patch that solves the inconsistent locking exposed in the test and the test now completes successful in every run
        Hide
        Michael McCandless added a comment -

        Ahh, nice catch! I was able to repro the failure before, but not after, your fix.

        I folded in the fix, made some minor formatting changes to the comments, absorbed the test case into the existing one. I think it's ready; I'll commit shortly.

        Thanks Hans!

        Show
        Michael McCandless added a comment - Ahh, nice catch! I was able to repro the failure before, but not after, your fix. I folded in the fix, made some minor formatting changes to the comments, absorbed the test case into the existing one. I think it's ready; I'll commit shortly. Thanks Hans!
        Hide
        ASF subversion and git services added a comment -

        Commit 1570559 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1570559 ]

        LUCENE-5461: fix thread hazard in ControlledRealTimeReopenThread causing a possible too-long wait time when a thread was waiting for a specific generation

        Show
        ASF subversion and git services added a comment - Commit 1570559 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1570559 ] LUCENE-5461 : fix thread hazard in ControlledRealTimeReopenThread causing a possible too-long wait time when a thread was waiting for a specific generation
        Hide
        ASF subversion and git services added a comment -

        Commit 1570560 from Michael McCandless in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1570560 ]

        LUCENE-5461: fix thread hazard in ControlledRealTimeReopenThread causing a possible too-long wait time when a thread was waiting for a specific generation

        Show
        ASF subversion and git services added a comment - Commit 1570560 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1570560 ] LUCENE-5461 : fix thread hazard in ControlledRealTimeReopenThread causing a possible too-long wait time when a thread was waiting for a specific generation
        Hide
        ASF subversion and git services added a comment -

        Commit 1570562 from Michael McCandless in branch 'dev/branches/lucene_solr_4_7'
        [ https://svn.apache.org/r1570562 ]

        LUCENE-5461: fix thread hazard in ControlledRealTimeReopenThread causing a possible too-long wait time when a thread was waiting for a specific generation

        Show
        ASF subversion and git services added a comment - Commit 1570562 from Michael McCandless in branch 'dev/branches/lucene_solr_4_7' [ https://svn.apache.org/r1570562 ] LUCENE-5461 : fix thread hazard in ControlledRealTimeReopenThread causing a possible too-long wait time when a thread was waiting for a specific generation
        Hide
        Michael McCandless added a comment -

        Thanks Hans!

        I ported back to 4.7.x branch in case we spin again.

        Show
        Michael McCandless added a comment - Thanks Hans! I ported back to 4.7.x branch in case we spin again.
        Hide
        Hans Lund added a comment -

        Super, thanks Michael; I'll read the formatting rules a bit closer

        Show
        Hans Lund added a comment - Super, thanks Michael; I'll read the formatting rules a bit closer
        Hide
        Uwe Schindler added a comment -

        Close issue after release of 4.8.0

        Show
        Uwe Schindler added a comment - Close issue after release of 4.8.0

          People

          • Assignee:
            Unassigned
            Reporter:
            Hans Lund
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development