Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10518

FieldInfos consistency check can refuse to open Lucene 8 index

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Fixed
    • 8.10.1
    • 9.2, 9.1.1
    • core/index
    • None
    • New

    Description

      A field-infos consistency check introduced in Lucene 9 (LUCENE-9334) can refuse to open a Lucene 8 index. Lucene 8 can create a partial FieldInfo if hitting a non-aborting exception (for example term is too long) during processing fields of a document. We don't have this problem in Lucene 9 as we process fields in two phases with the first phase processing only FieldInfos.

      The issue can be reproduced with this snippet.

      public void testWriteIndexOn8x() throws Exception {
        FieldType KeywordField = new FieldType();
        KeywordField.setTokenized(false);
        KeywordField.setOmitNorms(true);
        KeywordField.setIndexOptions(IndexOptions.DOCS);
        KeywordField.freeze();
      
        try (Directory dir = newDirectory()) {
          IndexWriterConfig config = new IndexWriterConfig();
          config.setCommitOnClose(false);
          config.setMergePolicy(NoMergePolicy.INSTANCE);
          try (IndexWriter writer = new IndexWriter(dir, config)) {
      
            // first segment
            writer.addDocument(new Document()); // an empty doc
            Document d1 = new Document();
            byte[] chars = new byte[IndexWriter.MAX_STORED_STRING_LENGTH + 1];
            Arrays.fill(chars, (byte) 'a');
            d1.add(new Field("field", new BytesRef(chars), KeywordField));
            d1.add(new BinaryDocValuesField("field", new BytesRef(chars)));
            expectThrows(IllegalArgumentException.class, () -> writer.addDocument(d1));
            writer.flush();
      
            // second segment
            Document d2 = new Document();
            d2.add(new Field("field", new BytesRef("hello world"), KeywordField));
            d2.add(new SortedDocValuesField("field", new BytesRef("hello world")));
            writer.addDocument(d2);
            writer.flush();
            writer.commit();
      
            // Check for doc values types consistency
            Map<String, DocValuesType> docValuesTypes = new HashMap<>();
            try(DirectoryReader reader = DirectoryReader.open(dir)){
              for (LeafReaderContext leaf : reader.leaves()) {
                for (FieldInfo fi : leaf.reader().getFieldInfos()) {
                  DocValuesType current = docValuesTypes.putIfAbsent(fi.name, fi.getDocValuesType());
                  if (current != null && current != fi.getDocValuesType()) {
                    fail("cannot change DocValues type from " + current + " to " + fi.getDocValuesType() + " for field \"" + fi.name + "\"");
                  }
                }
              }
            }
          }
        }
      }
      

      I would like to propose to:

      • Backport the two-phase fields processing from Lucene9 to Lucene8. The patch should be small and contained.
      • Introduce an option in Lucene9 to skip checking field-infos consistency (i.e., behave like Lucene 8 when the option is enabled).

      /cc mayya and jpountz

      Attachments

        Activity

          People

            Unassigned Unassigned
            dnhatn Nhat Nguyen
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 40m
                1h 40m