Lucene - Core
  1. Lucene - Core
  2. LUCENE-5975

Lucene can't read 3.0-3.3 deleted documents

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.10.1
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      BitVector before Lucene 3.4 had many bugs, particulary that it wrote extra bogus trailing crap at the end.

      But since Lucene 4.8, we check that we read all the bytes... this check can fail for 3.0-3.3 indexes due to the previous bugs in those indexes, instead users will get exception on open like this: CorruptIndexException(did not read all bytes from file: read 5000 vs 5001....

      1. LUCENE-5975.patch
        58 kB
        Robert Muir
      2. LUCENE-5975.patch
        54 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        Patch. the fix is a one-line basically:

               if (version >= VERSION_CHECKSUM) {
                 CodecUtil.checkFooter(input);
        -      } else {
        +      } else if (version >= VERSION_DGAPS_CLEARED) {
                 CodecUtil.checkEOF(input);
        -      }
        +      } // otherwise, before this we cannot even check that we read the entire file due to bugs in those versions!!!!
               assert verifyCount();
        

        Patch is huge because the test includes all unique released versions of BitVector.java from 3.x.

        I think this is fine since it only applies for 4.10 branch anyway, we don't have to carry this crap in trunk or 5.x

        Show
        Robert Muir added a comment - Patch. the fix is a one-line basically: if (version >= VERSION_CHECKSUM) { CodecUtil.checkFooter(input); - } else { + } else if (version >= VERSION_DGAPS_CLEARED) { CodecUtil.checkEOF(input); - } + } // otherwise, before this we cannot even check that we read the entire file due to bugs in those versions!!!! assert verifyCount(); Patch is huge because the test includes all unique released versions of BitVector.java from 3.x. I think this is fine since it only applies for 4.10 branch anyway, we don't have to carry this crap in trunk or 5.x
        Hide
        Robert Muir added a comment -

        Forgot to 'svn add' my new test

        Show
        Robert Muir added a comment - Forgot to 'svn add' my new test
        Hide
        Ryan Ernst added a comment -

        Nice test!
        +1

        Show
        Ryan Ernst added a comment - Nice test! +1
        Hide
        Robert Muir added a comment -

        Thanks Ryan

        I added an additional assert to the test:

            assertEquals(numSet, current.size() - current.count());
        

        and a warning to each backwards file that it should not be modified.

        I also beasted the test.

        I will commit to the 4.10 branch soon. It doesn't need to go anywhere else.

        Show
        Robert Muir added a comment - Thanks Ryan I added an additional assert to the test: assertEquals(numSet, current.size() - current.count()); and a warning to each backwards file that it should not be modified. I also beasted the test. I will commit to the 4.10 branch soon. It doesn't need to go anywhere else.
        Hide
        ASF subversion and git services added a comment -

        Commit 1627183 from Robert Muir in branch 'dev/branches/lucene_solr_4_10'
        [ https://svn.apache.org/r1627183 ]

        LUCENE-5975: fix reading of 3.0-3.3 deleted documents

        Show
        ASF subversion and git services added a comment - Commit 1627183 from Robert Muir in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1627183 ] LUCENE-5975 : fix reading of 3.0-3.3 deleted documents
        Hide
        Uwe Schindler added a comment -

        Thanks for figuring that out!

        Nice test!

        Show
        Uwe Schindler added a comment - Thanks for figuring that out! Nice test!
        Hide
        Michael McCandless added a comment -

        Bulk close for Lucene/Solr 4.10.1 release

        Show
        Michael McCandless added a comment - Bulk close for Lucene/Solr 4.10.1 release

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development