Lucene - Core
  1. Lucene - Core
  2. LUCENE-2095

Document not guaranteed to be found after write and commit

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.1, 2.9.1
    • Fix Version/s: 2.9.2, 3.0.1, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Environment:

      Linux 64bit

    • Lucene Fields:
      New

      Description

      after same email on developer list:
      "I developed a stress test to assert that a new document containing a
      specific term "X" is always found after a commit on the IndexWriter.
      This works most of the time, but it fails under load in rare occasions.

      I'm testing with 40 Threads, both with a SerialMergeScheduler and a
      ConcurrentMergeScheduler, all sharing a common IndexWriter.
      Attached testcase is using a RAMDirectory only, but I verified a
      FSDirectory behaves in the same way so I don't believe it's the
      Directory implementation or the MergeScheduler.
      This test is slow, so I don't consider it a functional or unit test.
      It might give false positives: it doesn't always fail, sorry I
      couldn't find out how to make it more likely to happen, besides
      scheduling it to run for a longer time."

      I tested this to affect versions 2.4.1 and 2.9.1;

      1. LUCENE-2095.patch
        10 kB
        Michael McCandless
      2. lucene-stresstest.patch
        13 kB
        Sanne Grinovero

        Activity

        Hide
        vijaykumarraja.grandhi added a comment -

        Thank you for guiding me correctly. will do that.

        Show
        vijaykumarraja.grandhi added a comment - Thank you for guiding me correctly. will do that.
        Hide
        Simon Willnauer added a comment -

        Hey man, you should really ask you question on a Lucene .NET mailing list and NOT on a Lucene Java JIRA issues which is completely unrelated.
        Please make sure you are asking your questions on the appropriate list - those folks will be happy to help and answer your question.

        simon

        Show
        Simon Willnauer added a comment - Hey man, you should really ask you question on a Lucene .NET mailing list and NOT on a Lucene Java JIRA issues which is completely unrelated. Please make sure you are asking your questions on the appropriate list - those folks will be happy to help and answer your question. simon
        Hide
        vijaykumarraja.grandhi added a comment - - edited

        Please help me. Slowly all my trails are getting dried out. failing to resolve multi threading with Lucene. It is getting deadlog. Always I am seeing some Write.Lock file inside Index folder.

        Show
        vijaykumarraja.grandhi added a comment - - edited Please help me. Slowly all my trails are getting dried out. failing to resolve multi threading with Lucene. It is getting deadlog. Always I am seeing some Write.Lock file inside Index folder.
        Hide
        vijaykumarraja.grandhi added a comment -

        I am currently using LUCENE Net 2.9.2 version. We have upgraded from v1.9.0 to 2.9.2. Basically we want to use threading concept now. But i am strucked with a lock. How to over come with these locks. Can any one provide .net code sample. Thank you in advance.

        Show
        vijaykumarraja.grandhi added a comment - I am currently using LUCENE Net 2.9.2 version. We have upgraded from v1.9.0 to 2.9.2. Basically we want to use threading concept now. But i am strucked with a lock. How to over come with these locks. Can any one provide .net code sample. Thank you in advance.
        Hide
        Michael McCandless added a comment -

        Thanks a lot Michael,

        Thank you!

        this makes my distributed testing reliable again

        I'm glad to hear it Thanks for bringing closure...

        I see you didn't apply my testcase

        Actually, I did: I boiled your testcase down and added it, in TestIndexWriter.java (testCommitThreadSafety). The more tests the better!

        Show
        Michael McCandless added a comment - Thanks a lot Michael, Thank you! this makes my distributed testing reliable again I'm glad to hear it Thanks for bringing closure... I see you didn't apply my testcase Actually, I did: I boiled your testcase down and added it, in TestIndexWriter.java (testCommitThreadSafety). The more tests the better!
        Hide
        Sanne Grinovero added a comment -

        Thanks a lot Michael, this makes my distributed testing reliable again

        I see you didn't apply my testcase, do you think it's not needed to have such a test?
        If you need I could change it as you wish.

        Show
        Sanne Grinovero added a comment - Thanks a lot Michael, this makes my distributed testing reliable again I see you didn't apply my testcase, do you think it's not needed to have such a test? If you need I could change it as you wish.
        Hide
        Michael McCandless added a comment -

        Thanks Sanne!

        Show
        Michael McCandless added a comment - Thanks Sanne!
        Hide
        Michael McCandless added a comment -

        Thanks for reminding me – I'll remove it on backport.

        Show
        Michael McCandless added a comment - Thanks for reminding me – I'll remove it on backport.
        Hide
        Uwe Schindler added a comment -

        Your patch only works for trunk and 3.0. If you want to fix it in 2.9 you must remove the j.u.concurrent* class usage.

        Show
        Uwe Schindler added a comment - Your patch only works for trunk and 3.0. If you want to fix it in 2.9 you must remove the j.u.concurrent* class usage.
        Hide
        Michael McCandless added a comment -

        Patch attached. I think it's ready to commit:

        • Added a simple monitor lock during commit() to serialize threads.
        • Boiled down the test case to a new test case in TestIndexWriter.
        • Found & fixed the [unrelated] deadlock in TestCrash, and cutover
          RAMDir's tracking of sizeInBytes to an AtomicLong, as discussed on
          java-dev.
        Show
        Michael McCandless added a comment - Patch attached. I think it's ready to commit: Added a simple monitor lock during commit() to serialize threads. Boiled down the test case to a new test case in TestIndexWriter. Found & fixed the [unrelated] deadlock in TestCrash, and cutover RAMDir's tracking of sizeInBytes to an AtomicLong, as discussed on java-dev.
        Hide
        Michael McCandless added a comment -

        I think the main cost is often fsync'ing the new files.

        But I agree we should simply lock so only one thread can be in commit
        at once. Concurrency inside commit (when fscyning) seems silly (it
        used to be slightly more interesting when we had autoCommit=true).

        We shouldn't use IW's lock. First, it's overkill (doing so would
        unnecessarily block other sync'd ops from running). Second, it leads
        to deadlock, at least in CMS (if it waits because too many merges are
        running). I'll use a separate lock.

        I'm not adding locking around the separate calls to prepareCommit then
        commit. An app must ensure these two calls are always balanced.

        Making flush unsynchronized would be great but I think wouldn't be
        that big a gain in practice, unless we could make it truly unsync'd
        even with adding new docs. Ie, if while we are flushing the last
        segment, we can accept adding/deleting docs into a new ram buffer in
        DW, that'd be quite interesting. But that's a big change!

        Show
        Michael McCandless added a comment - I think the main cost is often fsync'ing the new files. But I agree we should simply lock so only one thread can be in commit at once. Concurrency inside commit (when fscyning) seems silly (it used to be slightly more interesting when we had autoCommit=true). We shouldn't use IW's lock. First, it's overkill (doing so would unnecessarily block other sync'd ops from running). Second, it leads to deadlock, at least in CMS (if it waits because too many merges are running). I'll use a separate lock. I'm not adding locking around the separate calls to prepareCommit then commit. An app must ensure these two calls are always balanced. Making flush unsynchronized would be great but I think wouldn't be that big a gain in practice, unless we could make it truly unsync'd even with adding new docs. Ie, if while we are flushing the last segment, we can accept adding/deleting docs into a new ram buffer in DW, that'd be quite interesting. But that's a big change!
        Hide
        Jason Rutherglen added a comment -

        Why do we allow un-synchronized commit if doFlush is
        synchronized? The main cost of commit is most likely in the
        flush as it doesn't wait for merges? There's a todo above
        doFlush indicating we may want to make it un-synchronized in the
        future to not block merges, how tenable is this?

        Show
        Jason Rutherglen added a comment - Why do we allow un-synchronized commit if doFlush is synchronized? The main cost of commit is most likely in the flush as it doesn't wait for merges? There's a todo above doFlush indicating we may want to make it un-synchronized in the future to not block merges, how tenable is this?
        Hide
        Michael McCandless added a comment -

        I can get this to fail fairly quickly, using 2 threads.

        It's a thread safety issue of commit itself. If you only allow 1 thread into commit at a time I believe the issue won't happen.

        It has to do with what thread #2 does when it enters commit and thread #1 is already in the process of committing. In this case thread #2 basically waits for thread #1 to finish and then returns, whereas it should start its own new commit.

        Show
        Michael McCandless added a comment - I can get this to fail fairly quickly, using 2 threads. It's a thread safety issue of commit itself. If you only allow 1 thread into commit at a time I believe the issue won't happen. It has to do with what thread #2 does when it enters commit and thread #1 is already in the process of committing. In this case thread #2 basically waits for thread #1 to finish and then returns, whereas it should start its own new commit.
        Hide
        Sanne Grinovero added a comment -

        attaching the testcase, apply to version 2.9.1.
        It's slow, please be patient.

        Show
        Sanne Grinovero added a comment - attaching the testcase, apply to version 2.9.1. It's slow, please be patient.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Sanne Grinovero
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development