Lucene - Core
  1. Lucene - Core
  2. LUCENE-4596

DirectoryTaxonomyWriter concurrency bug

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1, Trunk
    • Component/s: modules/facet
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Mike tripped this error while running some benchmarks:

      {no format}

      Caused by: java.lang.ArrayIndexOutOfBoundsException: 130
      at org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.getParent(DirectoryTaxonomyWriter.java:835)
      at org.apache.lucene.facet.index.streaming.CategoryParentsStream.incrementToken(CategoryParentsStream.java:106)
      at org.apache.lucene.facet.index.streaming.CountingListTokenizer.incrementToken(CountingListTokenizer.java:63)
      at org.apache.lucene.facet.index.streaming.CategoryTokenizer.incrementToken(CategoryTokenizer.java:48)
      at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:177)
      at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:272)
      at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:250)
      at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:376)
      at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1455)
      at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1130)
      at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1111)
      at perf.IndexThreads$IndexThread.run(IndexThreads.java:335)

      
      

      At first we thought this might be related to LUCENE-4565, but he reverted to before that commit and still hit the exception. I modified TestDirTaxoWriter.testConcurrency to index hierarchical categories, thinking that's the cause, but failed to reproduce.

      Eventually I realized that the test doesn't call getParent(), because it tests DirTaxoWriter concurrency, not concurrent indexing. As soon as I added a call to getParent, I hit this exception too.

      Adding 'synchronized' to DirTaxoWriter.addCategory seems to avoid that ex.

      I'll upload a patch with the modifications to the test and dig.

      1. LUCENE-4596.patch
        14 kB
        Shai Erera
      2. LUCENE-4596.patch
        4 kB
        Shai Erera

        Activity

        Hide
        Shai Erera added a comment -

        Patch with modifications to testConcurrency which exposes the bug.

        At any rate, I'll add a concurrency indexing test !

        Show
        Shai Erera added a comment - Patch with modifications to testConcurrency which exposes the bug. At any rate, I'll add a concurrency indexing test !
        Hide
        Shai Erera added a comment -

        Marked the issue blocker for 4.1. If I won't fix it by then, we should at least add 'synchronized' again to addCategory.

        Show
        Shai Erera added a comment - Marked the issue blocker for 4.1. If I won't fix it by then, we should at least add 'synchronized' again to addCategory.
        Hide
        Michael McCandless added a comment -

        I just tested indexing first 10M Wikipedia docs w/ date facets, using 12 threads.

        W/o the sync (ie as checked in) it took 212.7 seconds.

        With the sync added, it took 216.6 seconds.

        That's pretty minor ... maybe we should just add the sync...?

        Show
        Michael McCandless added a comment - I just tested indexing first 10M Wikipedia docs w/ date facets, using 12 threads. W/o the sync (ie as checked in) it took 212.7 seconds. With the sync added, it took 216.6 seconds. That's pretty minor ... maybe we should just add the sync...?
        Hide
        Shai Erera added a comment -

        2% isn't critical, but I'd like to get to the bottom of it before I add the sync back. For instance, I added this line to getParent:

        assert ordinal < parents.length : "requested ordinal (" + ordinal + "); parents.length (" + parents.length + ") !";
        

        And the failures are:

        7/Dez/2012 18:33:06 com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
        WARNING: Uncaught exception in thread: Thread[Thread-10,5,TGRP-TestDirectoryTaxonomyWriter]
        Throwable occurred: java.lang.AssertionError: requested ordinal (2078) > parents.length (2078) !
        	at __randomizedtesting.SeedInfo.seed([1CE23D1B45DD9EE5]:0)
        	at org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.getParent(DirectoryTaxonomyWriter.java:837)
        	at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter$4.run(TestDirectoryTaxonomyWriter.java:267)
        7/Dez/2012 18:33:06 com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
        WARNING: Uncaught exception in thread: Thread[Thread-6,5,TGRP-TestDirectoryTaxonomyWriter]
        Throwable occurred: java.lang.AssertionError: requested ordinal (2338) > parents.length (2338) !
        	at __randomizedtesting.SeedInfo.seed([1CE23D1B45DD9EE5]:0)
        	at org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.getParent(DirectoryTaxonomyWriter.java:837)
        	at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter$4.run(TestDirectoryTaxonomyWriter.java:267)
        Thread-9 growing array: curlen=2962 ord=2962 newlen=3334
        Thread-11 growing array: curlen=3334 ord=3334 newlen=3752
        7/Dez/2012 18:33:06 com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
        WARNING: Uncaught exception in thread: Thread[Thread-9,5,TGRP-TestDirectoryTaxonomyWriter]
        Throwable occurred: java.lang.AssertionError: requested ordinal (3334) > parents.length (3334) !
        	at __randomizedtesting.SeedInfo.seed([1CE23D1B45DD9EE5]:0)
        	at org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.getParent(DirectoryTaxonomyWriter.java:837)
        	at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter$4.run(TestDirectoryTaxonomyWriter.java:267)
        7/Dez/2012 18:33:06 com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
        WARNING: Uncaught exception in thread: Thread[Thread-8,5,TGRP-TestDirectoryTaxonomyWriter]
        Throwable occurred: java.lang.AssertionError: requested ordinal (3334) > parents.length (3334) !
        	at __randomizedtesting.SeedInfo.seed([1CE23D1B45DD9EE5]:0)
        	at org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.getParent(DirectoryTaxonomyWriter.java:837)
        	at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter$4.run(TestDirectoryTaxonomyWriter.java:267)
        7/Dez/2012 18:33:06 com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
        WARNING: Uncaught exception in thread: Thread[Thread-7,5,TGRP-TestDirectoryTaxonomyWriter]
        Throwable occurred: java.lang.AssertionError: requested ordinal (3334) > parents.length (3334) !
        	at __randomizedtesting.SeedInfo.seed([1CE23D1B45DD9EE5]:0)
        	at org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.getParent(DirectoryTaxonomyWriter.java:837)
        	at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter$4.run(TestDirectoryTaxonomyWriter.java:267)
        

        Note how the requested ordinal always equals to parents.length !! I need to figure this out, but perhaps another set of eyes can help. The test fails pretty easily with the attached patch.

        Show
        Shai Erera added a comment - 2% isn't critical, but I'd like to get to the bottom of it before I add the sync back. For instance, I added this line to getParent: assert ordinal < parents.length : "requested ordinal (" + ordinal + "); parents.length (" + parents.length + ") !" ; And the failures are: 7/Dez/2012 18:33:06 com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException WARNING: Uncaught exception in thread: Thread[Thread-10,5,TGRP-TestDirectoryTaxonomyWriter] Throwable occurred: java.lang.AssertionError: requested ordinal (2078) > parents.length (2078) ! at __randomizedtesting.SeedInfo.seed([1CE23D1B45DD9EE5]:0) at org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.getParent(DirectoryTaxonomyWriter.java:837) at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter$4.run(TestDirectoryTaxonomyWriter.java:267) 7/Dez/2012 18:33:06 com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException WARNING: Uncaught exception in thread: Thread[Thread-6,5,TGRP-TestDirectoryTaxonomyWriter] Throwable occurred: java.lang.AssertionError: requested ordinal (2338) > parents.length (2338) ! at __randomizedtesting.SeedInfo.seed([1CE23D1B45DD9EE5]:0) at org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.getParent(DirectoryTaxonomyWriter.java:837) at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter$4.run(TestDirectoryTaxonomyWriter.java:267) Thread-9 growing array: curlen=2962 ord=2962 newlen=3334 Thread-11 growing array: curlen=3334 ord=3334 newlen=3752 7/Dez/2012 18:33:06 com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException WARNING: Uncaught exception in thread: Thread[Thread-9,5,TGRP-TestDirectoryTaxonomyWriter] Throwable occurred: java.lang.AssertionError: requested ordinal (3334) > parents.length (3334) ! at __randomizedtesting.SeedInfo.seed([1CE23D1B45DD9EE5]:0) at org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.getParent(DirectoryTaxonomyWriter.java:837) at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter$4.run(TestDirectoryTaxonomyWriter.java:267) 7/Dez/2012 18:33:06 com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException WARNING: Uncaught exception in thread: Thread[Thread-8,5,TGRP-TestDirectoryTaxonomyWriter] Throwable occurred: java.lang.AssertionError: requested ordinal (3334) > parents.length (3334) ! at __randomizedtesting.SeedInfo.seed([1CE23D1B45DD9EE5]:0) at org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.getParent(DirectoryTaxonomyWriter.java:837) at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter$4.run(TestDirectoryTaxonomyWriter.java:267) 7/Dez/2012 18:33:06 com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException WARNING: Uncaught exception in thread: Thread[Thread-7,5,TGRP-TestDirectoryTaxonomyWriter] Throwable occurred: java.lang.AssertionError: requested ordinal (3334) > parents.length (3334) ! at __randomizedtesting.SeedInfo.seed([1CE23D1B45DD9EE5]:0) at org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.getParent(DirectoryTaxonomyWriter.java:837) at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyWriter$4.run(TestDirectoryTaxonomyWriter.java:267) Note how the requested ordinal always equals to parents.length !! I need to figure this out, but perhaps another set of eyes can help. The test fails pretty easily with the attached patch.
        Hide
        Shai Erera added a comment -

        Ok, that was an interesting experience. Mike and I chatted about it, I'm w/o the code, I ask Mike to look here, he pastes code, then look there, he pastes another code ... like playing blind chess !

        And then bam ! bug found (we think, Mike is still beasting). The code first updates DTW.cache, then updates the parents array. So what probably happens is:

        • T1 addCategory(123), updates cache
        • context switch, T2 addCategory(123), receives that 123 is found.
        • T2 calls getParent(123), BOOM! parentArray still not updated by T1

        Simple fix, swap the two lines in addCategoryDocument. Cache should always be updated last !

        It's late and weekend here, I'll do some beasting too, and if all goes well, will commit the fix by Sunday !

        Thanks Mike for guiding me in the dark !

        Show
        Shai Erera added a comment - Ok, that was an interesting experience. Mike and I chatted about it, I'm w/o the code, I ask Mike to look here, he pastes code, then look there, he pastes another code ... like playing blind chess ! And then bam ! bug found (we think, Mike is still beasting). The code first updates DTW.cache, then updates the parents array. So what probably happens is: T1 addCategory(123), updates cache context switch, T2 addCategory(123), receives that 123 is found. T2 calls getParent(123), BOOM! parentArray still not updated by T1 Simple fix, swap the two lines in addCategoryDocument. Cache should always be updated last ! It's late and weekend here, I'll do some beasting too, and if all goes well, will commit the fix by Sunday ! Thanks Mike for guiding me in the dark !
        Hide
        Michael McCandless added a comment -

        Beasting ran many iterations with that change w/o any failures! I think it's a keeper

        Show
        Michael McCandless added a comment - Beasting ran many iterations with that change w/o any failures! I think it's a keeper
        Hide
        Shai Erera added a comment -

        Patch fixes the bug and adds TestConcurrentFacetsIndexing. For now, it looks very similar to TestDirTaxoWriter.testConcurrency, only adds indexing (fails without the fix !).

        I'll commit this shortly.

        Show
        Shai Erera added a comment - Patch fixes the bug and adds TestConcurrentFacetsIndexing. For now, it looks very similar to TestDirTaxoWriter.testConcurrency, only adds indexing (fails without the fix !). I'll commit this shortly.
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Shai Erera
        http://svn.apache.org/viewvc?view=revision&revision=1418860

        LUCENE-4596: DirectoryTaxonomyWriter concurrency bug

        Show
        Commit Tag Bot added a comment - [trunk commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1418860 LUCENE-4596 : DirectoryTaxonomyWriter concurrency bug
        Hide
        Shai Erera added a comment -

        Committed (trunk and 4x). Thanks Mike for your help and verifying the patch solves the issue !

        Show
        Shai Erera added a comment - Committed (trunk and 4x). Thanks Mike for your help and verifying the patch solves the issue !
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Shai Erera
        http://svn.apache.org/viewvc?view=revision&revision=1418873

        LUCENE-4596: DirectoryTaxonomyWriter concurrency bug

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1418873 LUCENE-4596 : DirectoryTaxonomyWriter concurrency bug
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Shai Erera
        http://svn.apache.org/viewvc?view=revision&revision=1418873

        LUCENE-4596: DirectoryTaxonomyWriter concurrency bug

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1418873 LUCENE-4596 : DirectoryTaxonomyWriter concurrency bug

          People

          • Assignee:
            Shai Erera
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development