Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1918

Adding empty ParallelReader indexes to an IndexWriter may cause ArrayIndexOutOfBoundsException or NoSuchElementException

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.4.1, 2.9
    • 2.4.1, 2.9
    • core/index
    • None
    • any

    • New, Patch Available

    Description

      Hi,
      I recently stumbled upon this:

      It is possible (and perfectly legal) to add empty indexes (IndexReaders) to an IndexWriter. However, when using ParallelReaders in this context, in two situations RuntimeExceptions may occur for no good reason.

      Condition 1:
      The indexes within the ParallelReader are just empty.

      When adding them to the IndexWriter, we get a java.util.NoSuchElementException triggered by ParallelTermEnum's constructor. The reason for that is the TreeMap#firstKey() method which was assumed to return null if there is no entry (which is not true, apparently – it only returns null if the first key in the Map is null).

      Condition 2 (Assuming the aforementioned bug is fixed):
      The indexes within the ParallelReader originally contained one or more fields with TermVectors, but all documents have been marked as deleted.

      When adding the indexes to the IndexWriter, we get a java.lang.ArrayIndexOutOfBoundsException triggered by TermVectorsWriter#addAllDocVectors. The reason here is that TermVectorsWriter assumes that if the index is marked to have TermVectors, at least one field actually exists for that. This unfortunately is not true, either.

      Patches and a testcase demonstrating the two bugs are provided.

      Cheers,
      Christian

      Attachments

        1. ParallelReaderWithEmptyIndex-testcase.patch
          3 kB
          Christian Kohlschütter
        2. ParallelReaderWithEmptyIndex.patch
          2 kB
          Christian Kohlschütter
        3. LUCENE-1918.patch
          8 kB
          Michael McCandless

        Activity

          People

            mikemccand Michael McCandless
            ck@newsclub.de Christian Kohlschütter
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 0.5h
                0.5h
                Remaining:
                Remaining Estimate - 0.5h
                0.5h
                Logged:
                Time Spent - Not Specified
                Not Specified