Details
-
Bug
-
Status: Patch Available
-
Major
-
Resolution: Fixed
-
8.10.1
-
None
-
New
Description
A field-infos consistency check introduced in Lucene 9 (LUCENE-9334) can refuse to open a Lucene 8 index. Lucene 8 can create a partial FieldInfo if hitting a non-aborting exception (for example term is too long) during processing fields of a document. We don't have this problem in Lucene 9 as we process fields in two phases with the first phase processing only FieldInfos.
The issue can be reproduced with this snippet.
public void testWriteIndexOn8x() throws Exception { FieldType KeywordField = new FieldType(); KeywordField.setTokenized(false); KeywordField.setOmitNorms(true); KeywordField.setIndexOptions(IndexOptions.DOCS); KeywordField.freeze(); try (Directory dir = newDirectory()) { IndexWriterConfig config = new IndexWriterConfig(); config.setCommitOnClose(false); config.setMergePolicy(NoMergePolicy.INSTANCE); try (IndexWriter writer = new IndexWriter(dir, config)) { // first segment writer.addDocument(new Document()); // an empty doc Document d1 = new Document(); byte[] chars = new byte[IndexWriter.MAX_STORED_STRING_LENGTH + 1]; Arrays.fill(chars, (byte) 'a'); d1.add(new Field("field", new BytesRef(chars), KeywordField)); d1.add(new BinaryDocValuesField("field", new BytesRef(chars))); expectThrows(IllegalArgumentException.class, () -> writer.addDocument(d1)); writer.flush(); // second segment Document d2 = new Document(); d2.add(new Field("field", new BytesRef("hello world"), KeywordField)); d2.add(new SortedDocValuesField("field", new BytesRef("hello world"))); writer.addDocument(d2); writer.flush(); writer.commit(); // Check for doc values types consistency Map<String, DocValuesType> docValuesTypes = new HashMap<>(); try(DirectoryReader reader = DirectoryReader.open(dir)){ for (LeafReaderContext leaf : reader.leaves()) { for (FieldInfo fi : leaf.reader().getFieldInfos()) { DocValuesType current = docValuesTypes.putIfAbsent(fi.name, fi.getDocValuesType()); if (current != null && current != fi.getDocValuesType()) { fail("cannot change DocValues type from " + current + " to " + fi.getDocValuesType() + " for field \"" + fi.name + "\""); } } } } } } }
I would like to propose to:
- Backport the two-phase fields processing from Lucene9 to Lucene8. The patch should be small and contained.
- Introduce an option in Lucene9 to skip checking field-infos consistency (i.e., behave like Lucene 8 when the option is enabled).