Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9755

Index Segment without DocValues May Cause Search to Fail

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 8.x, 8.3.1, 8.8
    • None
    • core/search
    • New

    Description

      Not sure if this can be considered a bug, but it is certainly a caveat that may slip through testing due to its nature.

      Consider the following scenario:

      • all documents in the index have a field "numfield" indexed as IntPoint
      • in addition, SOME of those documents are also indexed with a SortedNumericDocValuesField using the same "numfield" name

      The documents without the DocValues cannot be matched from any queries that involve sorting, so we save some space by omitting the DocValues for those documents.

      This works perfectly fine, unless

      • the index contains a segment that only contains documents without the DocValues

      In this case, running a query that sorts by "numfield" will throw the following exception:

      java.lang.IllegalStateException: unexpected docvalues type NONE for field 'numfield' (expected one of [SORTED_NUMERIC, NUMERIC]). Re-index with correct docvalues type.
         at org.apache.lucene.index.DocValues.checkField(DocValues.java:317)
         at org.apache.lucene.index.DocValues.getSortedNumeric(DocValues.java:389)
         at org.apache.lucene.search.SortedNumericSortField$3.getNumericDocValues(SortedNumericSortField.java:159)
         at org.apache.lucene.search.FieldComparator$NumericComparator.doSetNextReader(FieldComparator.java:155)

      I have included a minimal example program that demonstrates the issue. This will

      • create an index with two documents, each having "numfield" indexed
      • add a DocValuesField "numfield" only for the first document
      • force the two documents into separate index segments
      • run a query that matches only the first document and sorts by "numfield"

      This results in the aforementioned exception.

      When removing the following lines from the code:

      if (i==docCount/2) {
        iw.commit();
      }
      

      both documents get added to the same segment. When re-running the code creating with a single index segment, the query works fine.

      Tested with Lucene 8.3.1 and 8.8.0  .

      Like I said, this may not be considered a bug. But it has slipped through our testing because the existence of such a DocValues-free segment is such a rare and short-lived event.

      We can avoid this issue in the future by using a different field name for the DocValuesField. But for our production systems we have to patch DocValues.checkField() to suppress the IllegalStateException as reindexing is not an option right now.

      Attachments

        1. DocValuesTest.java
          4 kB
          Thomas Hecker

        Activity

          People

            Unassigned Unassigned
            tomhecker Thomas Hecker
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: