Lucene - Core
  1. Lucene - Core
  2. LUCENE-1474

Incorrect SegmentInfo.delCount when IndexReader.flush() is used

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.4
    • Fix Version/s: 2.4.1, 2.9
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      When deleted documents are flushed using IndexReader.flush() the delCount in SegmentInfo is updated based on the current value and SegmentReader.pendingDeleteCount (introduced by LUCENE-1267). It seems that pendingDeleteCount is not reset after the commit, which means after a second flush() or close() of an index reader the delCount in SegmentInfo is incorrect. A subsequent IndexReader.open() call will fail with an error when assertions are enabled. E.g.:

      java.lang.AssertionError: delete count mismatch: info=3 vs BitVector=2
      at org.apache.lucene.index.SegmentReader.loadDeletedDocs(SegmentReader.java:405)
      [...]

      1. CheckIndex.txt
        66 kB
        Erik van Zijst
      2. IndexReaderTest.java
        1 kB
        Marcel Reutegger

        Activity

        Hide
        Marcel Reutegger added a comment -

        Test case to reproduce issue.

        Show
        Marcel Reutegger added a comment - Test case to reproduce issue.
        Hide
        Michael McCandless added a comment -

        Good catch! Great test case... thanks Marcel.

        I'll commit the fix (to set pendingDeleteCount to 0 after flushing deletes) shortly.

        Show
        Michael McCandless added a comment - Good catch! Great test case... thanks Marcel. I'll commit the fix (to set pendingDeleteCount to 0 after flushing deletes) shortly.
        Hide
        Michael McCandless added a comment -

        Committed revision 722645.

        Thanks Marcel!

        Show
        Michael McCandless added a comment - Committed revision 722645. Thanks Marcel!
        Hide
        Marcel Reutegger added a comment -

        Are there any plans to release a 2.4.1? If yes, can this fix be included?

        Show
        Marcel Reutegger added a comment - Are there any plans to release a 2.4.1? If yes, can this fix be included?
        Hide
        Michael McCandless added a comment -

        There are no concrete plans / timing for a 2.4.1 release, but I think it's likely it'll eventually happen if we accumulate enough issues to backport.

        I'll reopen this & add 2.4.1 fix version so we don't lose track of it...

        Show
        Michael McCandless added a comment - There are no concrete plans / timing for a 2.4.1 release, but I think it's likely it'll eventually happen if we accumulate enough issues to backport. I'll reopen this & add 2.4.1 fix version so we don't lose track of it...
        Hide
        Michael McCandless added a comment -

        Committed revision 722964 on 2.4 branch.

        Show
        Michael McCandless added a comment - Committed revision 722964 on 2.4 branch.
        Hide
        Michael McCandless added a comment -

        Reopening for backport to 2.4.1.

        Show
        Michael McCandless added a comment - Reopening for backport to 2.4.1.
        Hide
        Michael McCandless added a comment -

        Already backported to 2.4 branch.

        Show
        Michael McCandless added a comment - Already backported to 2.4 branch.
        Hide
        Erik van Zijst added a comment - - edited

        For some time now we've been getting similar assertion errors on our development machines when running with assertions enabled (disabling them kills the assertion error and doesn't appear to have any negative impact though).

        We're running with lucene 2.4.1 and the unit test attached to this issue succeeds as expected, but our own code still appears to trigger a similar problem:

        [java] ERROR - Problem processing revisions from repo https://studio.atlassian.com/svn/FE due to class java.lang.AssertionError - delete count mismatch: info=128 vs BitVector=126
        [java] java.lang.AssertionError: delete count mismatch: info=128 vs BitVector=126
        [java] at org.apache.lucene.index.SegmentReader.loadDeletedDocs(SegmentReader.java:405)
        [java] at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:368)
        [java] at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:306)
        [java] at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:228)
        [java] at org.apache.lucene.index.MultiSegmentReader.<init>(MultiSegmentReader.java:55)
        [java] at org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:104)
        [java] at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653)
        [java] at org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:115)
        [java] at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
        [java] at org.apache.lucene.index.IndexReader.open(IndexReader.java:216)
        [java] at com.cenqua.fisheye.lucene.LuceneConnection.getReader(LuceneConnection.java:309)
        [java] at com.cenqua.fisheye.lucene.LuceneConnection.withReaderAndDeletes(LuceneConnection.java:158)
        [java] at com.cenqua.fisheye.rep.RepositoryIndexer.deleteDocuments(RepositoryIndexer.java:146)
        [java] at com.cenqua.fisheye.rep.RepositoryIndexer.deleteContent(RepositoryIndexer.java:138)
        [java] at com.cenqua.fisheye.rep.RepositoryScanner.deleteContent(RepositoryScanner.java:594)
        [java] at com.cenqua.fisheye.rep.RepositoryScanner.indexContent(RepositoryScanner.java:578)
        [java] at com.cenqua.fisheye.rep.RepositoryScanner.processContent(RepositoryScanner.java:432)
        [java] at com.cenqua.fisheye.rep.RepositoryScanner.slurpRepository(RepositoryScanner.java:263)
        [java] at com.cenqua.fisheye.rep.RepositoryScanner.slurp(RepositoryScanner.java:175)
        [java] at com.cenqua.fisheye.rep.RepositoryScanner.ping(RepositoryScanner.java:108)
        [java] at com.cenqua.fisheye.svn.SvnRepositoryEngine.doSlurp(SvnRepositoryEngine.java:68)
        [java] at com.cenqua.fisheye.rep.ping.OneOffPingRequest.doRequest(OneOffPingRequest.java:30)
        [java] at com.cenqua.fisheye.rep.ping.PingRequest.process(PingRequest.java:90)
        [java] at com.cenqua.fisheye.rep.RepositoryHandle.processPingRequests(RepositoryHandle.java:119)
        [java] at com.cenqua.fisheye.rep.RepositoryHandle.queuePingRequest(RepositoryHandle.java:109)
        [java] at com.cenqua.fisheye.rep.ping.PingRequest.run(PingRequest.java:57)
        [java] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
        [java] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
        [java] at java.lang.Thread.run(Thread.java:613)

        cheers,
        Erik
        Atlassian

        Show
        Erik van Zijst added a comment - - edited For some time now we've been getting similar assertion errors on our development machines when running with assertions enabled (disabling them kills the assertion error and doesn't appear to have any negative impact though). We're running with lucene 2.4.1 and the unit test attached to this issue succeeds as expected, but our own code still appears to trigger a similar problem: [java] ERROR - Problem processing revisions from repo https://studio.atlassian.com/svn/FE due to class java.lang.AssertionError - delete count mismatch: info=128 vs BitVector=126 [java] java.lang.AssertionError: delete count mismatch: info=128 vs BitVector=126 [java] at org.apache.lucene.index.SegmentReader.loadDeletedDocs(SegmentReader.java:405) [java] at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:368) [java] at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:306) [java] at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:228) [java] at org.apache.lucene.index.MultiSegmentReader.<init>(MultiSegmentReader.java:55) [java] at org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:104) [java] at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653) [java] at org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:115) [java] at org.apache.lucene.index.IndexReader.open(IndexReader.java:316) [java] at org.apache.lucene.index.IndexReader.open(IndexReader.java:216) [java] at com.cenqua.fisheye.lucene.LuceneConnection.getReader(LuceneConnection.java:309) [java] at com.cenqua.fisheye.lucene.LuceneConnection.withReaderAndDeletes(LuceneConnection.java:158) [java] at com.cenqua.fisheye.rep.RepositoryIndexer.deleteDocuments(RepositoryIndexer.java:146) [java] at com.cenqua.fisheye.rep.RepositoryIndexer.deleteContent(RepositoryIndexer.java:138) [java] at com.cenqua.fisheye.rep.RepositoryScanner.deleteContent(RepositoryScanner.java:594) [java] at com.cenqua.fisheye.rep.RepositoryScanner.indexContent(RepositoryScanner.java:578) [java] at com.cenqua.fisheye.rep.RepositoryScanner.processContent(RepositoryScanner.java:432) [java] at com.cenqua.fisheye.rep.RepositoryScanner.slurpRepository(RepositoryScanner.java:263) [java] at com.cenqua.fisheye.rep.RepositoryScanner.slurp(RepositoryScanner.java:175) [java] at com.cenqua.fisheye.rep.RepositoryScanner.ping(RepositoryScanner.java:108) [java] at com.cenqua.fisheye.svn.SvnRepositoryEngine.doSlurp(SvnRepositoryEngine.java:68) [java] at com.cenqua.fisheye.rep.ping.OneOffPingRequest.doRequest(OneOffPingRequest.java:30) [java] at com.cenqua.fisheye.rep.ping.PingRequest.process(PingRequest.java:90) [java] at com.cenqua.fisheye.rep.RepositoryHandle.processPingRequests(RepositoryHandle.java:119) [java] at com.cenqua.fisheye.rep.RepositoryHandle.queuePingRequest(RepositoryHandle.java:109) [java] at com.cenqua.fisheye.rep.ping.PingRequest.run(PingRequest.java:57) [java] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650) [java] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) [java] at java.lang.Thread.run(Thread.java:613) cheers, Erik Atlassian
        Hide
        Michael McCandless added a comment -

        Erik, do you use undeleteAll? I've found one path, using undeleteAll, that could cause this.

        Show
        Michael McCandless added a comment - Erik, do you use undeleteAll? I've found one path, using undeleteAll, that could cause this.
        Hide
        Erik van Zijst added a comment -

        Erik, do you use undeleteAll? I've found one path, using undeleteAll, that could cause this.

        No, we don't call IndexReader.undeleteAll().

        Show
        Erik van Zijst added a comment - Erik, do you use undeleteAll? I've found one path, using undeleteAll, that could cause this. No, we don't call IndexReader.undeleteAll() .
        Hide
        Michael McCandless added a comment -

        Are you able to run a modified Lucene JAR? If so, I'd like to add an assert at the point when Lucene commits deletes so we can catch earlier when this happens.

        Show
        Michael McCandless added a comment - Are you able to run a modified Lucene JAR? If so, I'd like to add an assert at the point when Lucene commits deletes so we can catch earlier when this happens.
        Hide
        Erik van Zijst added a comment -

        Sure, I'd be happy to.

        Show
        Erik van Zijst added a comment - Sure, I'd be happy to.
        Hide
        Michael McCandless added a comment -

        OK, thanks. Let's try this:

        Index: src/java/org/apache/lucene/index/SegmentReader.java
        ===================================================================
        --- src/java/org/apache/lucene/index/SegmentReader.java	(revision 768598)
        +++ src/java/org/apache/lucene/index/SegmentReader.java	(working copy)
        @@ -592,6 +592,7 @@
               
               si.setDelCount(si.getDelCount()+pendingDeleteCount);
               pendingDeleteCount = 0;
        +      assert deletedDocs.count() == si.getDelCount(): "delete count mismatch during commit: info=" + si.getDelCount() + " vs BitVector=" + deletedDocs.count();
             }
             if (undeleteAll && si.hasDeletions()) {
               si.clearDelGen();
        

        Hopefully you see that assert trip at some point "interesting" (ie, before the assert above).

        Show
        Michael McCandless added a comment - OK, thanks. Let's try this: Index: src/java/org/apache/lucene/index/SegmentReader.java =================================================================== --- src/java/org/apache/lucene/index/SegmentReader.java (revision 768598) +++ src/java/org/apache/lucene/index/SegmentReader.java (working copy) @@ -592,6 +592,7 @@ si.setDelCount(si.getDelCount()+pendingDeleteCount); pendingDeleteCount = 0; + assert deletedDocs.count() == si.getDelCount(): "delete count mismatch during commit: info=" + si.getDelCount() + " vs BitVector=" + deletedDocs.count(); } if (undeleteAll && si.hasDeletions()) { si.clearDelGen(); Hopefully you see that assert trip at some point "interesting" (ie, before the assert above).
        Hide
        Erik van Zijst added a comment -

        Should that be applied to tags/lucene_2_4_1 or trunk (the svn revision numbers don't seem to match those from your patch)?

        Show
        Erik van Zijst added a comment - Should that be applied to tags/lucene_2_4_1 or trunk (the svn revision numbers don't seem to match those from your patch)?
        Hide
        Michael McCandless added a comment -

        Sorry, 2.4.1

        Show
        Michael McCandless added a comment - Sorry, 2.4.1
        Hide
        Erik van Zijst added a comment -

        Running with the patch applied doesn't seem to have changed anything. Is it possible the damage to the index has already been done at this point?

        Note that I'm running lucene on large index files that were taken from our production servers. We've never seen these exception on these production servers, but then they do not run with assertions enabled. Using these copied index files (safely copied while the product was not running) on development machines consistently triggers this problem. I've not regenerated the entire index on the development machine, for that would be hugely impractical.

        Show
        Erik van Zijst added a comment - Running with the patch applied doesn't seem to have changed anything. Is it possible the damage to the index has already been done at this point? Note that I'm running lucene on large index files that were taken from our production servers. We've never seen these exception on these production servers, but then they do not run with assertions enabled. Using these copied index files (safely copied while the product was not running) on development machines consistently triggers this problem. I've not regenerated the entire index on the development machine, for that would be hugely impractical.
        Hide
        Michael McCandless added a comment -

        Yes, the damage once done will remain in the index.

        This new assert will only trip if the index is recreated, ie when a segment is first written with the wrong count.

        Can you run CheckIndex on your index and report back? I'm curious how many segments show the wrong delete count, and how much the counts are off.

        Show
        Michael McCandless added a comment - Yes, the damage once done will remain in the index. This new assert will only trip if the index is recreated, ie when a segment is first written with the wrong count. Can you run CheckIndex on your index and report back? I'm curious how many segments show the wrong delete count, and how much the counts are off.
        Hide
        Michael McCandless added a comment -

        Was the index at least partially generated with 2.4.0? Or was it entirely generated with 2.4.1?

        Show
        Michael McCandless added a comment - Was the index at least partially generated with 2.4.0? Or was it entirely generated with 2.4.1?
        Hide
        Erik van Zijst added a comment - - edited

        I have attached the output of CheckIndex on all our index directories, which seems to report quite a few errors:

        erik:cache ervzijst$ grep "CorruptIndexException|AssertionError" CheckIndex.txt
        java.lang.AssertionError: delete count mismatch: info=1263 vs BitVector=1262
        java.lang.AssertionError: delete count mismatch: info=496 vs BitVector=493
        java.lang.AssertionError: delete count mismatch: info=101 vs BitVector=100
        java.lang.AssertionError: delete count mismatch: info=300 vs BitVector=298
        java.lang.AssertionError: delete count mismatch: info=109 vs BitVector=108
        java.lang.AssertionError: delete count mismatch: info=140 vs BitVector=139
        java.lang.AssertionError: delete count mismatch: info=122 vs BitVector=121
        java.lang.AssertionError: delete count mismatch: info=91 vs BitVector=89
        java.lang.AssertionError: delete count mismatch: info=1411 vs BitVector=1409
        java.lang.AssertionError: delete count mismatch: info=801 vs BitVector=800
        java.lang.AssertionError: delete count mismatch: info=630 vs BitVector=629
        java.lang.AssertionError: delete count mismatch: info=510 vs BitVector=508
        org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _0: fieldsReader shows 12365 but segmentInfo shows 12232
        org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _1: fieldsReader shows 10144 but segmentInfo shows 8766
        org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _2: fieldsReader shows 4616 but segmentInfo shows 7006
        org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _3: fieldsReader shows 6681 but segmentInfo shows 4854
        org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _4: fieldsReader shows 2652 but segmentInfo shows 8808
        org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _5: fieldsReader shows 11500 but segmentInfo shows 14551
        org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _6: fieldsReader shows 16225 but segmentInfo shows 4375
        erik:cache ervzijst$

        P.S.
        The full log is attached as CheckIndex.txt

        Show
        Erik van Zijst added a comment - - edited I have attached the output of CheckIndex on all our index directories, which seems to report quite a few errors: erik:cache ervzijst$ grep "CorruptIndexException|AssertionError" CheckIndex.txt java.lang.AssertionError: delete count mismatch: info=1263 vs BitVector=1262 java.lang.AssertionError: delete count mismatch: info=496 vs BitVector=493 java.lang.AssertionError: delete count mismatch: info=101 vs BitVector=100 java.lang.AssertionError: delete count mismatch: info=300 vs BitVector=298 java.lang.AssertionError: delete count mismatch: info=109 vs BitVector=108 java.lang.AssertionError: delete count mismatch: info=140 vs BitVector=139 java.lang.AssertionError: delete count mismatch: info=122 vs BitVector=121 java.lang.AssertionError: delete count mismatch: info=91 vs BitVector=89 java.lang.AssertionError: delete count mismatch: info=1411 vs BitVector=1409 java.lang.AssertionError: delete count mismatch: info=801 vs BitVector=800 java.lang.AssertionError: delete count mismatch: info=630 vs BitVector=629 java.lang.AssertionError: delete count mismatch: info=510 vs BitVector=508 org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _0: fieldsReader shows 12365 but segmentInfo shows 12232 org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _1: fieldsReader shows 10144 but segmentInfo shows 8766 org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _2: fieldsReader shows 4616 but segmentInfo shows 7006 org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _3: fieldsReader shows 6681 but segmentInfo shows 4854 org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _4: fieldsReader shows 2652 but segmentInfo shows 8808 org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _5: fieldsReader shows 11500 but segmentInfo shows 14551 org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _6: fieldsReader shows 16225 but segmentInfo shows 4375 erik:cache ervzijst$ P.S. The full log is attached as CheckIndex.txt
        Hide
        Michael McCandless added a comment -

        Thanks Erik. Can you answer my other questions?

        Show
        Michael McCandless added a comment - Thanks Erik. Can you answer my other questions?
        Hide
        Adrian Hempel added a comment -

        Hi Michael,

        The index that Erik was working with contained segments created with a pre-2.4.1 version of Lucene, so we don't believe this is a regression.

        Regards,
        Adrian

        Show
        Adrian Hempel added a comment - Hi Michael, The index that Erik was working with contained segments created with a pre-2.4.1 version of Lucene, so we don't believe this is a regression. Regards, Adrian
        Hide
        Michael McCandless added a comment -

        OK that's good news: the bug is fixed in 2.4.1, but since your index had segments from before 2.4.1, it had the bug "cached".

        Show
        Michael McCandless added a comment - OK that's good news: the bug is fixed in 2.4.1, but since your index had segments from before 2.4.1, it had the bug "cached".

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Marcel Reutegger
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development