Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.4, 2.9
    • Fix Version/s: 2.4.1, 2.9
    • Component/s: core/index
    • Labels:
      None
    • Environment:

      Ubuntu 8.04, x86_64
      Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)

    • Lucene Fields:
      New

      Description

      Scenario:

      • create an index with arbitrary content, and close it
      • open IndexWriter again, and add a document with binary field (stored but not compressed)
      • close IndexWriter without optimizing, so that the new document is in a separate segment.
      • open IndexReader. You can read the last document and its binary field just fine.
      • open IndexWriter, optimize the index, close IndexWriter
      • open IndexReader. Now the field is still present (not null) and is marked as binary, but the data is not there - Field.getBinaryLength() returns 0.

        Activity

        Hide
        Andrzej Bialecki added a comment -

        Test case to illustrate the problem. This happens both in 2.4.0 and trunk, although the patch is from trunk.

        Unfortunately, I don't know the reason for this behavior, so I can't provide a fix.

        Show
        Andrzej Bialecki added a comment - Test case to illustrate the problem. This happens both in 2.4.0 and trunk, although the patch is from trunk. Unfortunately, I don't know the reason for this behavior, so I can't provide a fix.
        Hide
        Michael McCandless added a comment -

        I found the issue. It was caused by LUCENE-1219 (first released in
        2.4.0), which added a reuse API to Fieldable for binary fields. When
        loading a field for merging we were failing to set the binaryLength.
        A similar case affected lazy field merging (I extended the test case
        to show it).

        This is a silent data loss bug. It only affects non-compressed binary
        fields. Whenever segments are merged such that the segment's fields
        are non-congruent (ie, the same field name was assigned different
        field numbers across the segments being merged), then binary fields in
        those segments are all set to 0 length.

        I will commit shortly.

        Show
        Michael McCandless added a comment - I found the issue. It was caused by LUCENE-1219 (first released in 2.4.0), which added a reuse API to Fieldable for binary fields. When loading a field for merging we were failing to set the binaryLength. A similar case affected lazy field merging (I extended the test case to show it). This is a silent data loss bug. It only affects non-compressed binary fields. Whenever segments are merged such that the segment's fields are non-congruent (ie, the same field name was assigned different field numbers across the segments being merged), then binary fields in those segments are all set to 0 length. I will commit shortly.
        Hide
        Michael McCandless added a comment -

        Committed revision 713962 to trunk.

        I think we should back-port this for a future 2.4.1.

        Show
        Michael McCandless added a comment - Committed revision 713962 to trunk. I think we should back-port this for a future 2.4.1.
        Hide
        Michael McCandless added a comment -

        Committed revision 713970 on 2.4 branch.

        Thanks for reporting this Andrzej!

        Show
        Michael McCandless added a comment - Committed revision 713970 on 2.4 branch. Thanks for reporting this Andrzej!
        Hide
        raz71abb6 added a comment -
        Show
        raz71abb6 added a comment - flights baby names

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Andrzej Bialecki
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development