Lucene - Core
  1. Lucene - Core
  2. LUCENE-4455

CheckIndex shows wrong segment size in 4.0 because SegmentInfoPerCommit.sizeInBytes counts every file 2 times; check for deletions is negated and results in wrong output

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 4.0-BETA
    • Fix Version/s: 4.0, 6.0
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I found this bug in 4.0-RC1 when I compared the checkindex outputs for 4.0 and 3.6.1:

      • The segment size is twice as big as reported by "ls -lh". The reason is that SegmentInfoPerCommit.sizeInBytes counts every file 2 times. This seems to be not so serious (it is just statistics), but: MergePolicy chooses merges because of this. On the other hand if all segments are twice as big it should not affect merging behaviour (unless absolute sizes in megabytes are used). So we should really fix this - sorry for investigating this so late!
      • The deletions in the segments are inverted. Segments that have no deleteions are reported as those with deletions but delGen=-1, and those with deletions show "no deletions", this is not serious, but should be fixed, too.

      There is one "bug" in sizeInBytes (which we should NOT fix), is that for 3.x indexes, if they are from 3.0 and have shared doc stores they are overestimated. But that's fine. For this case, the index was a 3.6.1 segment and a 4.0 segment, both showed double size.

      1. LUCENE-4455.patch
        5 kB
        Michael McCandless

        Activity

        Hide
        Robert Muir added a comment -

        Thanks Uwe for finding this!

        Show
        Robert Muir added a comment - Thanks Uwe for finding this!
        Hide
        Michael McCandless added a comment -

        Patch w/ tests + fixes.

        Show
        Michael McCandless added a comment - Patch w/ tests + fixes.
        Hide
        Robert Muir added a comment -

        +1

        Show
        Robert Muir added a comment - +1
        Hide
        Uwe Schindler added a comment -

        Thanks for fixing!

        Show
        Uwe Schindler added a comment - Thanks for fixing!
        Hide
        Michael McCandless added a comment -

        Thanks Uwe! Keeps testing

        Show
        Michael McCandless added a comment - Thanks Uwe! Keeps testing
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Michael McCandless
        http://svn.apache.org/viewvc?view=revision&revision=1392482

        LUCENE-4455: fix SIPC.sizeInBytes() to not double-count; fix CheckIndex to not reverse 'has deletions'/'no deletions'

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1392482 LUCENE-4455 : fix SIPC.sizeInBytes() to not double-count; fix CheckIndex to not reverse 'has deletions'/'no deletions'
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Uwe Schindler
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development