Lucene - Core
  1. Lucene - Core
  2. LUCENE-5544

exceptions during IW.rollback can leak files and locks

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.7.1, 4.8, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Today, rollback() doesn't always succeed: if it does, it closes the writer nicely. otherwise, if it hits exception, it leaves you with a half-broken writer, still potentially holding file handles and write lock.

      This is especially bad if you use Native locks, because you are kind of hosed, the static map prevents you from forcefully unlocking (e.g. IndexWriter.unlock) so you have no real course of action to try to recover.

      If rollback() hits exception, it should still deliver the exception, but release things (e.g. like IOUtils.close).

      1. LUCENE-5544.patch
        8 kB
        Robert Muir
      2. LUCENE-5544.patch
        6 kB
        Robert Muir

        Issue Links

          Activity

          Hide
          Robert Muir added a comment -

          Here's the start to a patch. Really the current rollback code is too crazy, there is no need for it to call the super-scary closeInternal(false, false) at the end, when in this case all that huge complicated piece of code is doing, is just calling close() on IndexFileDeleter and releasing write.lock.

          Show
          Robert Muir added a comment - Here's the start to a patch. Really the current rollback code is too crazy, there is no need for it to call the super-scary closeInternal(false, false) at the end, when in this case all that huge complicated piece of code is doing, is just calling close() on IndexFileDeleter and releasing write.lock.
          Hide
          Shai Erera added a comment -

          Patch looks good. So basically with this patch, the state of IW after rollback() is it's always closed() and doesn't leak any important resources like write.lock and pooled readers. And there's no way to continue using this instance - app must create a new IW instance. We can still end up w/ a segments_N file in the directory though (if its delete failed), but I guess IW will detect it's corrupt and use the one from the previous commit.

          About the test, maybe instead of asserting that IW.isLocked == false, try to open a new IW? I guess it will fail if you remove the stuff that you added to the finally clause? That will guarantee that we test what the app is likely to do after calling rollback().

          And also, do you think it's better to use MDW.failOn to randomly fail if we're somewhere in rollback() stack? Cause currently the test fails only in one of two places. Just thinking about making the test more evil.

          Show
          Shai Erera added a comment - Patch looks good. So basically with this patch, the state of IW after rollback() is it's always closed() and doesn't leak any important resources like write.lock and pooled readers. And there's no way to continue using this instance - app must create a new IW instance. We can still end up w/ a segments_N file in the directory though (if its delete failed), but I guess IW will detect it's corrupt and use the one from the previous commit. About the test, maybe instead of asserting that IW.isLocked == false, try to open a new IW? I guess it will fail if you remove the stuff that you added to the finally clause? That will guarantee that we test what the app is likely to do after calling rollback(). And also, do you think it's better to use MDW.failOn to randomly fail if we're somewhere in rollback() stack? Cause currently the test fails only in one of two places. Just thinking about making the test more evil.
          Hide
          Robert Muir added a comment -

          About the test, maybe instead of asserting that IW.isLocked == false, try to open a new IW? I guess it will fail if you remove the stuff that you added to the finally clause? That will guarantee that we test what the app is likely to do after calling rollback().

          Well the current test doesnt even need that assert: its just for clarity. we dont need an assert for this stuff at all: the last line of directory.close() (MDW) will fail if there are open locks or files!

          And also, do you think it's better to use MDW.failOn to randomly fail if we're somewhere in rollback() stack? Cause currently the test fails only in one of two places. Just thinking about making the test more evil.

          This is a good idea.

          Show
          Robert Muir added a comment - About the test, maybe instead of asserting that IW.isLocked == false, try to open a new IW? I guess it will fail if you remove the stuff that you added to the finally clause? That will guarantee that we test what the app is likely to do after calling rollback(). Well the current test doesnt even need that assert: its just for clarity. we dont need an assert for this stuff at all: the last line of directory.close() (MDW) will fail if there are open locks or files! And also, do you think it's better to use MDW.failOn to randomly fail if we're somewhere in rollback() stack? Cause currently the test fails only in one of two places. Just thinking about making the test more evil. This is a good idea.
          Hide
          Shai Erera added a comment -

          Just thinking about making the test more evil.

          Though if the exception happens from Lock.close(), the lock will still exist and the test will fail asserting that the writer isn't locked. It's a valid exception but nothing we can do about it while calling rollback(). So maybe exclude it from the list of allowed places to fail.

          Do you think it's better to not swallow the exceptions in the finally part, but add them as suppressed to any original exception? Because if e.g. lock.close() fails, app won't be able to open a new writer, yet all it has as info is the original exception that happened during rollback(), and no info that the lock couldn't be released either.

          Show
          Shai Erera added a comment - Just thinking about making the test more evil. Though if the exception happens from Lock.close(), the lock will still exist and the test will fail asserting that the writer isn't locked. It's a valid exception but nothing we can do about it while calling rollback(). So maybe exclude it from the list of allowed places to fail. Do you think it's better to not swallow the exceptions in the finally part, but add them as suppressed to any original exception? Because if e.g. lock.close() fails, app won't be able to open a new writer, yet all it has as info is the original exception that happened during rollback(), and no info that the lock couldn't be released either.
          Hide
          Robert Muir added a comment -

          You can definitely call it multiple times, and some tests in fact do just that. Thats why IOUtils.close() is used, which does nothing on a null parameter.

          Show
          Robert Muir added a comment - You can definitely call it multiple times, and some tests in fact do just that. Thats why IOUtils.close() is used, which does nothing on a null parameter.
          Hide
          Michael McCandless added a comment -

          +1

          Maybe add @Override to DocumentsWriter.close since it implements
          Closeable now?

          Show
          Michael McCandless added a comment - +1 Maybe add @Override to DocumentsWriter.close since it implements Closeable now?
          Hide
          Simon Willnauer added a comment -

          I think the patch looks awesome.. yet here are a couple of comments:

          In rollback we should notify with a try / finally since processEvents could throw an exception:

            try {
              processEvents(false, true);
            } finally {
              notifyAll();
            }
          

          I am not 100% sure about this but I think we need / should refresh the delete after processing events since it can publish pending flushes etc?

                   deleter.refresh();
           
                   lastCommitChangeCount = changeCount;
          
                   processEvents(false, true);
                   deleter.refresh();  // refresh here again after processing events?
                   deleter.close();
          

          maybe the exception message should be "BOOM!"

          thanks for doing this

          Show
          Simon Willnauer added a comment - I think the patch looks awesome.. yet here are a couple of comments: In rollback we should notify with a try / finally since processEvents could throw an exception: try { processEvents( false , true ); } finally { notifyAll(); } I am not 100% sure about this but I think we need / should refresh the delete after processing events since it can publish pending flushes etc? deleter.refresh(); lastCommitChangeCount = changeCount; processEvents( false , true ); deleter.refresh(); // refresh here again after processing events? deleter.close(); maybe the exception message should be "BOOM!" thanks for doing this
          Hide
          Robert Muir added a comment -

          Updated patch incorporating feedback: adding a 2nd test to produce random exceptions where we do i/o, addressing simon's bugs (thanks), and adding missing override.

          We can actually still assert the same things in the new 2nd test, because MDW doesn't throw deterministic exceptions from lock release (This would probably wreak havoc on the test framework if you think about it).

          Will commit shortly.

          Show
          Robert Muir added a comment - Updated patch incorporating feedback: adding a 2nd test to produce random exceptions where we do i/o, addressing simon's bugs (thanks), and adding missing override. We can actually still assert the same things in the new 2nd test, because MDW doesn't throw deterministic exceptions from lock release (This would probably wreak havoc on the test framework if you think about it). Will commit shortly.
          Hide
          ASF subversion and git services added a comment -

          Commit 1579975 from Robert Muir in branch 'dev/trunk'
          [ https://svn.apache.org/r1579975 ]

          LUCENE-5544: Exceptions during IW.rollback can leak files and locks

          Show
          ASF subversion and git services added a comment - Commit 1579975 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1579975 ] LUCENE-5544 : Exceptions during IW.rollback can leak files and locks
          Hide
          ASF subversion and git services added a comment -

          Commit 1579978 from Robert Muir in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1579978 ]

          LUCENE-5544: Exceptions during IW.rollback can leak files and locks

          Show
          ASF subversion and git services added a comment - Commit 1579978 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1579978 ] LUCENE-5544 : Exceptions during IW.rollback can leak files and locks
          Hide
          ASF subversion and git services added a comment -

          Commit 1579983 from Robert Muir in branch 'dev/branches/lucene_solr_4_7'
          [ https://svn.apache.org/r1579983 ]

          LUCENE-5544: Exceptions during IW.rollback can leak files and locks

          Show
          ASF subversion and git services added a comment - Commit 1579983 from Robert Muir in branch 'dev/branches/lucene_solr_4_7' [ https://svn.apache.org/r1579983 ] LUCENE-5544 : Exceptions during IW.rollback can leak files and locks
          Hide
          ASF subversion and git services added a comment -

          Commit 1583439 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1583439 ]

          LUCENE-5544: disregard leftover events after rollback has finished

          Show
          ASF subversion and git services added a comment - Commit 1583439 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1583439 ] LUCENE-5544 : disregard leftover events after rollback has finished
          Hide
          ASF subversion and git services added a comment -

          Commit 1583440 from Michael McCandless in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1583440 ]

          LUCENE-5544: disregard leftover events after rollback has finished

          Show
          ASF subversion and git services added a comment - Commit 1583440 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1583440 ] LUCENE-5544 : disregard leftover events after rollback has finished
          Hide
          Steve Rowe added a comment -

          Bulk close 4.7.1 issues

          Show
          Steve Rowe added a comment - Bulk close 4.7.1 issues

            People

            • Assignee:
              Unassigned
              Reporter:
              Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development