Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.4
-
None
-
New
Description
This bug was introduced with LUCENE-1219 (only present on 2.4).
The bug happens when merging compressed string fields, but only if bulk-merging code does not apply because the FieldInfos for the segment being merged are not congruent. This test shows the bug:
public void testMergeCompressedFields() throws IOException { File indexDir = new File(System.getProperty("tempDir"), "mergecompressedfields"); Directory dir = FSDirectory.getDirectory(indexDir); try { for(int i=0;i<5;i++) { // Must make a new writer & doc each time, w/ // different fields, so bulk merge of stored fields // cannot run: IndexWriter w = new IndexWriter(dir, new WhitespaceAnalyzer(), i==0, IndexWriter.MaxFieldLength.UNLIMITED); w.setMergeFactor(5); w.setMergeScheduler(new SerialMergeScheduler()); Document doc = new Document(); doc.add(new Field("test1", "this is some data that will be compressed this this this", Field.Store.COMPRESS, Field.Index.NO)); doc.add(new Field("test2", new byte[20], Field.Store.COMPRESS)); doc.add(new Field("field" + i, "random field", Field.Store.NO, Field.Index.TOKENIZED)); w.addDocument(doc); w.close(); } byte[] cmp = new byte[20]; IndexReader r = IndexReader.open(dir); for(int i=0;i<5;i++) { Document doc = r.document(i); assertEquals("this is some data that will be compressed this this this", doc.getField("test1").stringValue()); byte[] b = doc.getField("test2").binaryValue(); assertTrue(Arrays.equals(b, cmp)); } } finally { dir.close(); _TestUtil.rmDir(indexDir); } }
It's because in FieldsReader, when we load a field "for merge" we create a FieldForMerge instance which subsequently does not return the right values for getBinary
{Value,Length,Offset}.