Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5975

Lucene can't read 3.0-3.3 deleted documents

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.10.1
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      BitVector before Lucene 3.4 had many bugs, particulary that it wrote extra bogus trailing crap at the end.

      But since Lucene 4.8, we check that we read all the bytes... this check can fail for 3.0-3.3 indexes due to the previous bugs in those indexes, instead users will get exception on open like this: CorruptIndexException(did not read all bytes from file: read 5000 vs 5001....

      1. LUCENE-5975.patch
        54 kB
        Robert Muir
      2. LUCENE-5975.patch
        58 kB
        Robert Muir

        Activity

        Hide
        rcmuir Robert Muir added a comment -

        Patch. the fix is a one-line basically:

               if (version >= VERSION_CHECKSUM) {
                 CodecUtil.checkFooter(input);
        -      } else {
        +      } else if (version >= VERSION_DGAPS_CLEARED) {
                 CodecUtil.checkEOF(input);
        -      }
        +      } // otherwise, before this we cannot even check that we read the entire file due to bugs in those versions!!!!
               assert verifyCount();
        

        Patch is huge because the test includes all unique released versions of BitVector.java from 3.x.

        I think this is fine since it only applies for 4.10 branch anyway, we don't have to carry this crap in trunk or 5.x

        Show
        rcmuir Robert Muir added a comment - Patch. the fix is a one-line basically: if (version >= VERSION_CHECKSUM) { CodecUtil.checkFooter(input); - } else { + } else if (version >= VERSION_DGAPS_CLEARED) { CodecUtil.checkEOF(input); - } + } // otherwise, before this we cannot even check that we read the entire file due to bugs in those versions!!!! assert verifyCount(); Patch is huge because the test includes all unique released versions of BitVector.java from 3.x. I think this is fine since it only applies for 4.10 branch anyway, we don't have to carry this crap in trunk or 5.x
        Hide
        rcmuir Robert Muir added a comment -

        Forgot to 'svn add' my new test

        Show
        rcmuir Robert Muir added a comment - Forgot to 'svn add' my new test
        Hide
        rjernst Ryan Ernst added a comment -

        Nice test!
        +1

        Show
        rjernst Ryan Ernst added a comment - Nice test! +1
        Hide
        rcmuir Robert Muir added a comment -

        Thanks Ryan

        I added an additional assert to the test:

            assertEquals(numSet, current.size() - current.count());
        

        and a warning to each backwards file that it should not be modified.

        I also beasted the test.

        I will commit to the 4.10 branch soon. It doesn't need to go anywhere else.

        Show
        rcmuir Robert Muir added a comment - Thanks Ryan I added an additional assert to the test: assertEquals(numSet, current.size() - current.count()); and a warning to each backwards file that it should not be modified. I also beasted the test. I will commit to the 4.10 branch soon. It doesn't need to go anywhere else.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1627183 from Robert Muir in branch 'dev/branches/lucene_solr_4_10'
        [ https://svn.apache.org/r1627183 ]

        LUCENE-5975: fix reading of 3.0-3.3 deleted documents

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1627183 from Robert Muir in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1627183 ] LUCENE-5975 : fix reading of 3.0-3.3 deleted documents
        Hide
        thetaphi Uwe Schindler added a comment -

        Thanks for figuring that out!

        Nice test!

        Show
        thetaphi Uwe Schindler added a comment - Thanks for figuring that out! Nice test!
        Hide
        mikemccand Michael McCandless added a comment -

        Bulk close for Lucene/Solr 4.10.1 release

        Show
        mikemccand Michael McCandless added a comment - Bulk close for Lucene/Solr 4.10.1 release

          People

          • Assignee:
            Unassigned
            Reporter:
            rcmuir Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development