Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1050

SimpleFSLockFactory ignores error on deleting the lock file

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.2
    • Fix Version/s: 2.3
    • Component/s: core/store
    • Labels:
      None

      Description

      Spinoff from here:

      http://www.gossamer-threads.com/lists/lucene/java-user/54438

      The Lock.release for SimpleFSLockFactory ignores the return value of lockFile.delete(). I plan to throw a new LockReleaseFailedException, subclassing from IOException, when this returns false. This is a very minor change to backwards compatibility because all methods in Lucene that release a lock already throw IOException.

      1. LUCENE-1050-2.patch
        1 kB
        Grant Ingersoll
      2. LUCENE-1050.patch
        7 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          gsingers Grant Ingersoll added a comment -

          Committed on 612869 and 612868 (branch and trunk)

          Show
          gsingers Grant Ingersoll added a comment - Committed on 612869 and 612868 (branch and trunk)
          Hide
          gsingers Grant Ingersoll added a comment -

          OK, this patch works and passes the test. I will commit to trunk and branch sometime after 12 EST today, so that we can still do the release.

          Show
          gsingers Grant Ingersoll added a comment - OK, this patch works and passes the test. I will commit to trunk and branch sometime after 12 EST today, so that we can still do the release.
          Hide
          mikemccand Michael McCandless added a comment -

          Grant, you just have to change that test to not assert that the writer2.close hit an exception, because according to these new semantics, it should NOT hit an exception since it is releasing a lock that it no longer holds. If you fix that do all other tests pass?

          Actually, I'd make that stronger: you should invert the assertion in that test. Ie, assert that no exception is hit on writer2.close(), to make sure double-releasing a lock never throws an exception in the future.

          Show
          mikemccand Michael McCandless added a comment - Grant, you just have to change that test to not assert that the writer2.close hit an exception, because according to these new semantics, it should NOT hit an exception since it is releasing a lock that it no longer holds. If you fix that do all other tests pass? Actually, I'd make that stronger: you should invert the assertion in that test. Ie, assert that no exception is hit on writer2.close(), to make sure double-releasing a lock never throws an exception in the future.
          Hide
          mikemccand Michael McCandless added a comment -

          I agree, we should fix it so that if you call release but you do not hold the lock then no exception is thrown.

          But, if you cal l release and you do hold the lock and the release fails then we should throw the exception.

          In fact, NativeFSLockFactory already handles it this way.

          Grant, you just have to change that test to not assert that the writer2.close hit an exception, because according to these new semantics, it should NOT hit an exception since it is releasing a lock that it no longer holds. If you fix that do all other tests pass?

          Show
          mikemccand Michael McCandless added a comment - I agree, we should fix it so that if you call release but you do not hold the lock then no exception is thrown. But, if you cal l release and you do hold the lock and the release fails then we should throw the exception. In fact, NativeFSLockFactory already handles it this way. Grant, you just have to change that test to not assert that the writer2.close hit an exception, because according to these new semantics, it should NOT hit an exception since it is releasing a lock that it no longer holds. If you fix that do all other tests pass?
          Hide
          gsingers Grant Ingersoll added a comment -

          Hmm, it seems putting in lockFile.exists() check causes the test to fail...
          Testcase: testFSDirectoryTwoCreates(org.apache.lucene.store.TestLockFactory): FAILED
          [junit] writer2.close() should have hit LockReleaseFailedException
          [junit] junit.framework.AssertionFailedError: writer2.close() should have hit LockReleaseFailedException
          [junit] at org.apache.lucene.store.TestLockFactory.testFSDirectoryTwoCreates(TestLockFactory.java:193)

          But, I guess I don't understand why it is failing there, since, based on the comments it says writer2 SHOULD be able to obtain the lock, so why would it then be expected to fail to release it if it already owns it?

          I guess I get to learn about locks now...

          Show
          gsingers Grant Ingersoll added a comment - Hmm, it seems putting in lockFile.exists() check causes the test to fail... Testcase: testFSDirectoryTwoCreates(org.apache.lucene.store.TestLockFactory): FAILED [junit] writer2.close() should have hit LockReleaseFailedException [junit] junit.framework.AssertionFailedError: writer2.close() should have hit LockReleaseFailedException [junit] at org.apache.lucene.store.TestLockFactory.testFSDirectoryTwoCreates(TestLockFactory.java:193) But, I guess I don't understand why it is failing there, since, based on the comments it says writer2 SHOULD be able to obtain the lock, so why would it then be expected to fail to release it if it already owns it? I guess I get to learn about locks now...
          Hide
          gsingers Grant Ingersoll added a comment -

          This should be fixed in 2.3.

          Show
          gsingers Grant Ingersoll added a comment - This should be fixed in 2.3.
          Hide
          gsingers Grant Ingersoll added a comment -

          I agree, it can be fixed by the SpellChecker, but it still seems like an error to throw an exception just b/c you try to delete something that doesn't exist, especially since the release() mechanism doesn't say what will happen if it is called when a lock doesn't exist.

          The fix in the lock is really simple, too:

          if (lockFile.exists() && !lockFile.delete()){
             throw...
          }
          

          I vote for fixing both cases.

          Show
          gsingers Grant Ingersoll added a comment - I agree, it can be fixed by the SpellChecker, but it still seems like an error to throw an exception just b/c you try to delete something that doesn't exist, especially since the release() mechanism doesn't say what will happen if it is called when a lock doesn't exist. The fix in the lock is really simple, too: if (lockFile.exists() && !lockFile.delete()){ throw ... } I vote for fixing both cases.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          FYI, I just verified that Solr does this correctly for the main index (since we'll be upgrading lucene versions soon)

                  if (IndexReader.isLocked(dir)) {
                    log.warning(logid+"WARNING: Solr index directory '" + getIndexDir() + "' is locked.  Unlocking...");
                    IndexReader.unlock(dir);
                  }
          
          Show
          yseeley@gmail.com Yonik Seeley added a comment - FYI, I just verified that Solr does this correctly for the main index (since we'll be upgrading lucene versions soon) if (IndexReader.isLocked(dir)) { log.warning(logid+ "WARNING: Solr index directory '" + getIndexDir() + "' is locked. Unlocking..." ); IndexReader.unlock(dir); }
          Hide
          hossman Hoss Man added a comment -

          Grant: my take on this is that SpellChecker.clearIndex is in the wrong. it shouldn't be calling unlock unless it has reason to think there is a "stale lock" that needs to be closed – ie: this is a bug in SpellChecker that you have only discovered because this bug LUCENE-1050 was fixed.

          I would suggest a new issue for tracking, and a patch in which SpellChecker.clearIndex doesn't call unlock unless isLocked returns true. Even then, it might make sense to catch and ignore LockReleaseFailedException and let whatever resulting exception may originate from "new IndexWriter" be returned.

          Show
          hossman Hoss Man added a comment - Grant: my take on this is that SpellChecker.clearIndex is in the wrong. it shouldn't be calling unlock unless it has reason to think there is a "stale lock" that needs to be closed – ie: this is a bug in SpellChecker that you have only discovered because this bug LUCENE-1050 was fixed. I would suggest a new issue for tracking, and a patch in which SpellChecker.clearIndex doesn't call unlock unless isLocked returns true. Even then, it might make sense to catch and ignore LockReleaseFailedException and let whatever resulting exception may originate from "new IndexWriter" be returned.
          Hide
          gsingers Grant Ingersoll added a comment -

          I'm getting an exception here, when using in Solr to create the Spell checking index.

          It seems the SpellChecker is telling the IndexReader to delete the lockFile, but the lockFile doesn't exist.

          SEVERE: org.apache.lucene.store.LockReleaseFailedException: failed to delete <path to>/solr/data/spell/write.lock
          at org.apache.lucene.store.SimpleFSLock.release(SimpleFSLockFactory.java:149)
          at org.apache.lucene.index.IndexReader.unlock(IndexReader.java:882)
          at org.apache.lucene.search.spell.SpellChecker.clearIndex(SpellChecker.java:287)
          at org.apache.solr.handler.SpellCheckerRequestHandler.rebuild(SpellCheckerRequestHandler.java:390)
          at org.apache.solr.handler.SpellCheckerRequestHandler.handleRequestBody(SpellCheckerRequestHandler.java:272)
          at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:117)
          at org.apache.solr.core.SolrCore.execute(SolrCore.java:902)

          I don't know much about the locking mechanism, but it seems like this should check to see if the lockFile exists before trying to delete it.

          I am running this on OS X.

          Show
          gsingers Grant Ingersoll added a comment - I'm getting an exception here, when using in Solr to create the Spell checking index. It seems the SpellChecker is telling the IndexReader to delete the lockFile, but the lockFile doesn't exist. SEVERE: org.apache.lucene.store.LockReleaseFailedException: failed to delete <path to>/solr/data/spell/write.lock at org.apache.lucene.store.SimpleFSLock.release(SimpleFSLockFactory.java:149) at org.apache.lucene.index.IndexReader.unlock(IndexReader.java:882) at org.apache.lucene.search.spell.SpellChecker.clearIndex(SpellChecker.java:287) at org.apache.solr.handler.SpellCheckerRequestHandler.rebuild(SpellCheckerRequestHandler.java:390) at org.apache.solr.handler.SpellCheckerRequestHandler.handleRequestBody(SpellCheckerRequestHandler.java:272) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:117) at org.apache.solr.core.SolrCore.execute(SolrCore.java:902) I don't know much about the locking mechanism, but it seems like this should check to see if the lockFile exists before trying to delete it. I am running this on OS X.
          Hide
          mikemccand Michael McCandless added a comment -

          I just committed this.

          Show
          mikemccand Michael McCandless added a comment - I just committed this.
          Hide
          mikemccand Michael McCandless added a comment -

          Simple patch implementing above approach. I plan to commit in a day or two.

          Show
          mikemccand Michael McCandless added a comment - Simple patch implementing above approach. I plan to commit in a day or two.

            People

            • Assignee:
              gsingers Grant Ingersoll
              Reporter:
              mikemccand Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development