Lucene - Core
  1. Lucene - Core
  2. LUCENE-1918

Adding empty ParallelReader indexes to an IndexWriter may cause ArrayIndexOutOfBoundsException or NoSuchElementException

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.1, 2.9
    • Fix Version/s: 2.4.1, 2.9
    • Component/s: core/index
    • Labels:
      None
    • Environment:

      any

    • Lucene Fields:
      New, Patch Available

      Description

      Hi,
      I recently stumbled upon this:

      It is possible (and perfectly legal) to add empty indexes (IndexReaders) to an IndexWriter. However, when using ParallelReaders in this context, in two situations RuntimeExceptions may occur for no good reason.

      Condition 1:
      The indexes within the ParallelReader are just empty.

      When adding them to the IndexWriter, we get a java.util.NoSuchElementException triggered by ParallelTermEnum's constructor. The reason for that is the TreeMap#firstKey() method which was assumed to return null if there is no entry (which is not true, apparently – it only returns null if the first key in the Map is null).

      Condition 2 (Assuming the aforementioned bug is fixed):
      The indexes within the ParallelReader originally contained one or more fields with TermVectors, but all documents have been marked as deleted.

      When adding the indexes to the IndexWriter, we get a java.lang.ArrayIndexOutOfBoundsException triggered by TermVectorsWriter#addAllDocVectors. The reason here is that TermVectorsWriter assumes that if the index is marked to have TermVectors, at least one field actually exists for that. This unfortunately is not true, either.

      Patches and a testcase demonstrating the two bugs are provided.

      Cheers,
      Christian

      1. LUCENE-1918.patch
        8 kB
        Michael McCandless
      2. ParallelReaderWithEmptyIndex.patch
        2 kB
        Christian Kohlschütter
      3. ParallelReaderWithEmptyIndex-testcase.patch
        3 kB
        Christian Kohlschütter

        Activity

        Hide
        Christian Kohlschütter added a comment -

        Testcase and bugfixes for trunk (should also be applicable to 2.4.1)

        Show
        Christian Kohlschütter added a comment - Testcase and bugfixes for trunk (should also be applicable to 2.4.1)
        Hide
        Michael McCandless added a comment -

        Patch looks good! Thanks Christian. Good catches!

        I made minor changes to it – added CHANGES entry, fixed indentaiton, switched the test over to MockRAMDir (and closed them) and added checkIndex calls.

        Show
        Michael McCandless added a comment - Patch looks good! Thanks Christian. Good catches! I made minor changes to it – added CHANGES entry, fixed indentaiton, switched the test over to MockRAMDir (and closed them) and added checkIndex calls.
        Hide
        Michael McCandless added a comment -

        Mark, I think we should commit this for 2.9?

        Show
        Michael McCandless added a comment - Mark, I think we should commit this for 2.9?
        Hide
        Uwe Schindler added a comment -

        I have no problem with this. I think we need a new RC for sure because of LUCENE-1919, which is very tricky and it should be tested by public, who have for sure lot of old-styled TokenStreams.

        Show
        Uwe Schindler added a comment - I have no problem with this. I think we need a new RC for sure because of LUCENE-1919 , which is very tricky and it should be tested by public, who have for sure lot of old-styled TokenStreams.
        Hide
        Mark Miller added a comment -

        Agreed - we are now stuck with a new rc it sounds, so let's fix what we can.

        Show
        Mark Miller added a comment - Agreed - we are now stuck with a new rc it sounds, so let's fix what we can.
        Hide
        Michael McCandless added a comment -

        Thanks Christian!

        Show
        Michael McCandless added a comment - Thanks Christian!

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Christian Kohlschütter
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 0.5h
              0.5h
              Remaining:
              Remaining Estimate - 0.5h
              0.5h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development