Lucene - Core
  1. Lucene - Core
  2. LUCENE-6299

IndexWriter's enforcement of 2.1B doc limits is buggy

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.10.4, 5.0, 5.1, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      E.g. if you pass an already > 2.1B docs to either addIndexes, it can fail to enforce properly.

      IW's private reserveDocs should refuse to accept negative values.

      IW.deleteAll fails to set the pendingNumDocs to 0.

      1. LUCENE-6299_addIndexes.patch
        12 kB
        Robert Muir
      2. LUCENE-6299.patch
        47 kB
        Robert Muir
      3. LUCENE-6299.patch
        44 kB
        Robert Muir
      4. LUCENE-6299.patch
        42 kB
        Michael McCandless
      5. LUCENE-6299.patch
        43 kB
        Michael McCandless
      6. LUCENE-6299-410x.patch
        13 kB
        Michael McCandless

        Activity

        Hide
        Robert Muir added a comment -

        I wish the exception was different when it strikes at runtime too. For a DirectoryReader, IMO it should be a CorruptIndexException, since IndexWriter prevents it, and thats how it behaves to users. On the other hand for a MultiReader, the current IllegalArgument is good (you multi'd too much stuff).

        Show
        Robert Muir added a comment - I wish the exception was different when it strikes at runtime too. For a DirectoryReader, IMO it should be a CorruptIndexException, since IndexWriter prevents it, and thats how it behaves to users. On the other hand for a MultiReader, the current IllegalArgument is good (you multi'd too much stuff).
        Hide
        Robert Muir added a comment -

        Here is a patch for addIndexes.
        we return an error before we do any copying, and don't allow any overflows.

        The test is kinda evil in the way it cheats, but its fast and does the job.

        disk full testing in general was broken for addIndexes in case of FSDir, because MockDir would delegate optional copyBytes method directly, and not count those bytes. Instead this mess of optional methods is cleaned up, so that MockDir checks everything, but when we run tests with the "raw" dir, we delegate everything directly to it (except close, for checkindex).

        Show
        Robert Muir added a comment - Here is a patch for addIndexes. we return an error before we do any copying, and don't allow any overflows. The test is kinda evil in the way it cheats, but its fast and does the job. disk full testing in general was broken for addIndexes in case of FSDir, because MockDir would delegate optional copyBytes method directly, and not count those bytes. Instead this mess of optional methods is cleaned up, so that MockDir checks everything, but when we run tests with the "raw" dir, we delegate everything directly to it (except close, for checkindex).
        Hide
        Michael McCandless added a comment -

        Thanks Rob, here's a new patch merged with yours.

        I added tests + fixes for the other issues, and discovered a
        ridiculous (index corruption) bug: on init IndexWriter does not set
        its pendingNumDocs to the number of docs already in the index.

        So this check is completely broken and it's trivial to make a corrupt
        index today! Grrr... I added a test and fixed that.

        I also added a test for addIndexes(CodecReader[]) too, fixed
        BaseCompositeReader to throw CorruptIndexException when it's a
        DirectoryReader, changed the exception to IllegalArgumentException,
        changed reserveDocs to just take a long and simplified the checking in
        addIndexes (just calls reserveDocs).

        Given that this is an index corruption issue I think we should
        back-port for 4.10.4, but on back-port I would only do the minimal bug
        fixes here.

        Show
        Michael McCandless added a comment - Thanks Rob, here's a new patch merged with yours. I added tests + fixes for the other issues, and discovered a ridiculous (index corruption) bug: on init IndexWriter does not set its pendingNumDocs to the number of docs already in the index. So this check is completely broken and it's trivial to make a corrupt index today! Grrr... I added a test and fixed that. I also added a test for addIndexes(CodecReader[]) too, fixed BaseCompositeReader to throw CorruptIndexException when it's a DirectoryReader, changed the exception to IllegalArgumentException, changed reserveDocs to just take a long and simplified the checking in addIndexes (just calls reserveDocs). Given that this is an index corruption issue I think we should back-port for 4.10.4, but on back-port I would only do the minimal bug fixes here.
        Hide
        Robert Muir added a comment -

        We should rethink how the patch adds CorruptIndexException to DR etc signatures. I don't like this. I would prefer IOException.

        making it corruptindexexception means every single user has to think about this crazy ass corner case, then they are left thinking 'ok what should my catch block do if the index is corrupt'. It makes things too hard to use.

        Show
        Robert Muir added a comment - We should rethink how the patch adds CorruptIndexException to DR etc signatures. I don't like this. I would prefer IOException. making it corruptindexexception means every single user has to think about this crazy ass corner case, then they are left thinking 'ok what should my catch block do if the index is corrupt'. It makes things too hard to use.
        Hide
        Robert Muir added a comment -

        Also on 4.x, we should not change the signatures of DR etc to 'throws IOException' either, I think its too heavy of a break?

        Lets just sneaky-throw the CorruptIndexException or something for 4.10.x?

        Show
        Robert Muir added a comment - Also on 4.x, we should not change the signatures of DR etc to 'throws IOException' either, I think its too heavy of a break? Lets just sneaky-throw the CorruptIndexException or something for 4.10.x?
        Hide
        Michael McCandless added a comment -

        I would prefer IOException.

        IOExc is a good idea, I'll switch to that.

        Also on 4.x, we should not change the signatures of DR etc to 'throws IOException' either, I think its too heavy of a break?

        I think we shouldn't backport that change to 4.x? I was just going to fix the IW bugs and add the test cases (except maybe the addIndexes test cases since they added RawDirectoryWrapper).

        Show
        Michael McCandless added a comment - I would prefer IOException. IOExc is a good idea, I'll switch to that. Also on 4.x, we should not change the signatures of DR etc to 'throws IOException' either, I think its too heavy of a break? I think we shouldn't backport that change to 4.x? I was just going to fix the IW bugs and add the test cases (except maybe the addIndexes test cases since they added RawDirectoryWrapper).
        Hide
        Robert Muir added a comment -

        sounds great.

        Show
        Robert Muir added a comment - sounds great.
        Hide
        Michael McCandless added a comment -

        Here's what I'd like to commit for 4.10.x. It fixes IW.deleteAll to set pending doc count to 0, IW ctor to set pending doc count to maxDoc, and both addIndexes to not overflow int when incoming indices are already too many docs. It also switches to IllegalArgumentExc.

        Show
        Michael McCandless added a comment - Here's what I'd like to commit for 4.10.x. It fixes IW.deleteAll to set pending doc count to 0, IW ctor to set pending doc count to maxDoc, and both addIndexes to not overflow int when incoming indices are already too many docs. It also switches to IllegalArgumentExc.
        Hide
        Michael McCandless added a comment -

        New patch for 5.x / trunk, cutting over to IOException. I think it's ready...

        Show
        Michael McCandless added a comment - New patch for 5.x / trunk, cutting over to IOException. I think it's ready...
        Hide
        Robert Muir added a comment -

        Can we fix javadocs on SIS.totalDocCount() [its being used more often here]. Its extremely confusing as I try to think about deletions, since it says it does not include deletions, but in fact it does.

        Show
        Robert Muir added a comment - Can we fix javadocs on SIS.totalDocCount() [its being used more often here] . Its extremely confusing as I try to think about deletions, since it says it does not include deletions, but in fact it does.
        Hide
        Robert Muir added a comment -

        I also think we shoudl open a followup issues to rename all these to be intuitive, e.g. totalNumDocs() -> totalMaxDoc(), getDocCount() -> maxDoc() and so on. this will make life easier on everyone and prevent bugs.

        Show
        Robert Muir added a comment - I also think we shoudl open a followup issues to rename all these to be intuitive, e.g. totalNumDocs() -> totalMaxDoc(), getDocCount() -> maxDoc() and so on. this will make life easier on everyone and prevent bugs.
        Hide
        Robert Muir added a comment -

        I think i figured out the problem i found with MDW. if you accdientally do MDW(MDW) i was seeing corruption, i think the problem is that MDW sometimes doesnt delegate fsync calls (to speed up tests). if you look at the code you will see it does a crazy unwrap-instanceof NRTCaching because this behavior is already problematic there.

        Now we have DisableFSyncFS inserted most of the time into tests, so i think this should just be removed. This means the fsync calls are no-ops but its happening at a lower level and Directories are unaware.

        Show
        Robert Muir added a comment - I think i figured out the problem i found with MDW. if you accdientally do MDW(MDW ) i was seeing corruption, i think the problem is that MDW sometimes doesnt delegate fsync calls (to speed up tests). if you look at the code you will see it does a crazy unwrap-instanceof NRTCaching because this behavior is already problematic there. Now we have DisableFSyncFS inserted most of the time into tests, so i think this should just be removed. This means the fsync calls are no-ops but its happening at a lower level and Directories are unaware.
        Hide
        Michael McCandless added a comment -

        Can we fix javadocs on SIS.totalDocCount()

        Good point, I'll change to:

          /** Returns sum of all segment's maxDoc. */
        

        I'll open a separate issue to rename SIS.totalDocCount -> getTotalMaxDoc and SI.docCount SI.maxDoc.

        I also think we shoudl open a followup issues to rename all these to be intuitive,

        +1, I'll open followup.

        Now we have DisableFSyncFS inserted most of the time into tests, so i think this should just be removed.

        +1, sneaky!

        Show
        Michael McCandless added a comment - Can we fix javadocs on SIS.totalDocCount() Good point, I'll change to: /** Returns sum of all segment's maxDoc. */ I'll open a separate issue to rename SIS.totalDocCount -> getTotalMaxDoc and SI.docCount SI.maxDoc. I also think we shoudl open a followup issues to rename all these to be intuitive, +1, I'll open followup. Now we have DisableFSyncFS inserted most of the time into tests, so i think this should just be removed. +1, sneaky!
        Hide
        Robert Muir added a comment -

        I updated patch with the MDW fixes, and added a test.

        Show
        Robert Muir added a comment - I updated patch with the MDW fixes, and added a test.
        Hide
        Michael McCandless added a comment -

        Thanks for fixing MDW Rob!

        Show
        Michael McCandless added a comment - Thanks for fixing MDW Rob!
        Hide
        Robert Muir added a comment -

        I don't like how deleteAll unconditionally sets pendingNumDocs. We are synced on lots of things, but everywhere else except startup is using atomic addAndGet with deltas so i don't worry. Can we do the same here?

        I also don't like how addindexes now does the reserveDocs before a bunch of i/o operations. stuff could go wrong during copySegmentAsIs/etc and this is actually fine, it wont corrupt the writer, so why should it leave it with a bloated pendingNumDocs? Previously this happened right before checkpoint.

        Show
        Robert Muir added a comment - I don't like how deleteAll unconditionally sets pendingNumDocs. We are synced on lots of things, but everywhere else except startup is using atomic addAndGet with deltas so i don't worry. Can we do the same here? I also don't like how addindexes now does the reserveDocs before a bunch of i/o operations. stuff could go wrong during copySegmentAsIs/etc and this is actually fine, it wont corrupt the writer, so why should it leave it with a bloated pendingNumDocs? Previously this happened right before checkpoint.
        Hide
        Robert Muir added a comment -

        Also the new ctor check at startup, we don't check that this value is within bounds. Given we know this stuff didnt work right/exist before, I think it would be nice to check and throw corruptindexexception on startup.

        Show
        Robert Muir added a comment - Also the new ctor check at startup, we don't check that this value is within bounds. Given we know this stuff didnt work right/exist before, I think it would be nice to check and throw corruptindexexception on startup.
        Hide
        Robert Muir added a comment -

        Updated patch. I added a test for opening a writer on a too-big index.

        SIS.readCommit() checks and throws CorruptIndexException. This also means tools like CheckIndex detect it too always. I also added an assertion inside its totalNumDocs method.

        Show
        Robert Muir added a comment - Updated patch. I added a test for opening a writer on a too-big index. SIS.readCommit() checks and throws CorruptIndexException. This also means tools like CheckIndex detect it too always. I also added an assertion inside its totalNumDocs method.
        Hide
        Michael McCandless added a comment -

        I don't like how deleteAll unconditionally sets pendingNumDocs. We are synced on lots of things, but everywhere else except startup is using atomic addAndGet with deltas so i don't worry. Can we do the same here?

        OK, you mean something like pendingNumDocs.addAndGet(-pendingNumDocs.get())?

        I also don't like how addindexes now does the reserveDocs before a bunch of i/o operations.

        That's a good point ... I'll put it back where it was before, and put the "best effort" check back in both addIndexes.

        Show
        Michael McCandless added a comment - I don't like how deleteAll unconditionally sets pendingNumDocs. We are synced on lots of things, but everywhere else except startup is using atomic addAndGet with deltas so i don't worry. Can we do the same here? OK, you mean something like pendingNumDocs.addAndGet(-pendingNumDocs.get())? I also don't like how addindexes now does the reserveDocs before a bunch of i/o operations. That's a good point ... I'll put it back where it was before, and put the "best effort" check back in both addIndexes.
        Show
        Michael McCandless added a comment - I made a branch here: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene6299
        Hide
        ASF subversion and git services added a comment -

        Commit 1662503 from Michael McCandless in branch 'dev/branches/lucene6299'
        [ https://svn.apache.org/r1662503 ]

        LUCENE-6299: make branch

        Show
        ASF subversion and git services added a comment - Commit 1662503 from Michael McCandless in branch 'dev/branches/lucene6299' [ https://svn.apache.org/r1662503 ] LUCENE-6299 : make branch
        Hide
        ASF subversion and git services added a comment -

        Commit 1662504 from Michael McCandless in branch 'dev/branches/lucene6299'
        [ https://svn.apache.org/r1662504 ]

        LUCENE-6299: commit rob's last patch

        Show
        ASF subversion and git services added a comment - Commit 1662504 from Michael McCandless in branch 'dev/branches/lucene6299' [ https://svn.apache.org/r1662504 ] LUCENE-6299 : commit rob's last patch
        Hide
        Robert Muir added a comment -

        OK, you mean something like pendingNumDocs.addAndGet(-pendingNumDocs.get())?

        No, i mean why can't we compute it from the segments we drop and just subtract that?

        Show
        Robert Muir added a comment - OK, you mean something like pendingNumDocs.addAndGet(-pendingNumDocs.get())? No, i mean why can't we compute it from the segments we drop and just subtract that?
        Hide
        Michael McCandless added a comment -

        No, i mean why can't we compute it from the segments we drop and just subtract that?

        OK I committed a change on the branch to do this ... and put addIndexes back to "best-effort up front and true reserveDocs just before changing SIS".

        Show
        Michael McCandless added a comment - No, i mean why can't we compute it from the segments we drop and just subtract that? OK I committed a change on the branch to do this ... and put addIndexes back to "best-effort up front and true reserveDocs just before changing SIS".
        Hide
        ASF subversion and git services added a comment -

        Commit 1662544 from Michael McCandless in branch 'dev/branches/lucene6299'
        [ https://svn.apache.org/r1662544 ]

        LUCENE-6299: fix deleteAll to deduct from pendingNumDocs more carefully; fix addIndexes to do best-effort check up front and only reserve for real just before changing SIS

        Show
        ASF subversion and git services added a comment - Commit 1662544 from Michael McCandless in branch 'dev/branches/lucene6299' [ https://svn.apache.org/r1662544 ] LUCENE-6299 : fix deleteAll to deduct from pendingNumDocs more carefully; fix addIndexes to do best-effort check up front and only reserve for real just before changing SIS
        Hide
        Robert Muir added a comment -

        +1

        Show
        Robert Muir added a comment - +1
        Hide
        ASF subversion and git services added a comment -

        Commit 1662571 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1662571 ]

        LUCENE-6299: IndexWriter was failing to enforce the 2.1 billion doc limit in one index

        Show
        ASF subversion and git services added a comment - Commit 1662571 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1662571 ] LUCENE-6299 : IndexWriter was failing to enforce the 2.1 billion doc limit in one index
        Hide
        ASF subversion and git services added a comment -

        Commit 1662576 from Michael McCandless in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1662576 ]

        LUCENE-6299: IndexWriter was failing to enforce the 2.1 billion doc limit

        Show
        ASF subversion and git services added a comment - Commit 1662576 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1662576 ] LUCENE-6299 : IndexWriter was failing to enforce the 2.1 billion doc limit
        Hide
        ASF subversion and git services added a comment -

        Commit 1662654 from Michael McCandless in branch 'dev/branches/lucene_solr_4_10'
        [ https://svn.apache.org/r1662654 ]

        LUCENE-6299: IndexWriter was failing to enforce the 2.1 billion doc limit

        Show
        ASF subversion and git services added a comment - Commit 1662654 from Michael McCandless in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1662654 ] LUCENE-6299 : IndexWriter was failing to enforce the 2.1 billion doc limit
        Hide
        Michael McCandless added a comment -

        Argh: the CHANGES.txt entry for this issue went into the wrong place in 4.10.x, even though the fix was in fact committed for 4.10.4....

        Show
        Michael McCandless added a comment - Argh: the CHANGES.txt entry for this issue went into the wrong place in 4.10.x, even though the fix was in fact committed for 4.10.4....
        Hide
        Michael McCandless added a comment -

        Bulk close for 4.10.4 release

        Show
        Michael McCandless added a comment - Bulk close for 4.10.4 release

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development