Issue Details (XML | Word | Printable)

Key: LUCENE-1452
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Michael McCandless
Reporter: Andrzej Bialecki
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Lucene - Java

Binary field content lost during optimize

Created: 13/Nov/08 10:31 PM   Updated: 25/Sep/09 04:23 PM
Return to search
Component/s: Index
Affects Version/s: 2.4, 2.9
Fix Version/s: 2.4.1, 2.9

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works binaryField-junit.patch 2008-11-13 10:35 PM Andrzej Bialecki 3 kB
Environment:
Ubuntu 8.04, x86_64
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)

Lucene Fields: New
Resolution Date: 14/Nov/08 10:31 AM


 Description  « Hide
Scenario:
  • create an index with arbitrary content, and close it
  • open IndexWriter again, and add a document with binary field (stored but not compressed)
  • close IndexWriter without optimizing, so that the new document is in a separate segment.
  • open IndexReader. You can read the last document and its binary field just fine.
  • open IndexWriter, optimize the index, close IndexWriter
  • open IndexReader. Now the field is still present (not null) and is marked as binary, but the data is not there - Field.getBinaryLength() returns 0.


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Andrzej Bialecki added a comment - 13/Nov/08 10:35 PM
Test case to illustrate the problem. This happens both in 2.4.0 and trunk, although the patch is from trunk.

Unfortunately, I don't know the reason for this behavior, so I can't provide a fix.


Michael McCandless added a comment - 14/Nov/08 09:51 AM
I found the issue. It was caused by LUCENE-1219 (first released in
2.4.0), which added a reuse API to Fieldable for binary fields. When
loading a field for merging we were failing to set the binaryLength.
A similar case affected lazy field merging (I extended the test case
to show it).

This is a silent data loss bug. It only affects non-compressed binary
fields. Whenever segments are merged such that the segment's fields
are non-congruent (ie, the same field name was assigned different
field numbers across the segments being merged), then binary fields in
those segments are all set to 0 length.

I will commit shortly.


Michael McCandless added a comment - 14/Nov/08 10:03 AM
Committed revision 713962 to trunk.

I think we should back-port this for a future 2.4.1.


Michael McCandless added a comment - 14/Nov/08 10:31 AM
Committed revision 713970 on 2.4 branch.

Thanks for reporting this Andrzej!