Lucene - Core
  1. Lucene - Core
  2. LUCENE-2104

IndexWriter.unlock does does nothing if NativeFSLockFactory is used

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.9, 2.9.1, 3.0
    • Fix Version/s: 2.9.3, 3.0.2, 3.1, 4.0-ALPHA
    • Component/s: core/store
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      If NativeFSLockFactory is used, IndexWriter.unlock will return, silently doing nothing. The reason is that NativeFSLockFactory's makeLock always creates a new NativeFSLock. NativeFSLock's release first checks if its lock is not null. However, only if obtain() is called, that lock is not null. So release actually does nothing, and so IndexWriter.unlock does not delete the lock, or fail w/ exception.
      This is only a problem in NativeFSLock, and not in other Lock implementations, at least as I was able to see.

      Need to think first how to reproduce in a test, and then fix it. I'll work on it.

      1. LUCENE-2104.patch
        3 kB
        Shai Erera
      2. LUCENE-2104.patch
        3 kB
        Shai Erera
      3. LUCENE-2104.patch
        3 kB
        Shai Erera

        Activity

        Hide
        Shai Erera added a comment -

        I think that if I move those lines (in NativeFSLock.release()):

              if (!path.delete())
                throw new LockReleaseFailedException("failed to delete " + path);
        

        to outside the if(lockExists()) section, this should work? Because then the new NativeFSLock will attempt to release an lock that's held by someone else, and fail. If the lock exists for some reason, but nobody is holding it, that line should succeed.

        In order to test it, I think I'll need to spawn two processes, which is trickier. Let me know what you think about the fix in the meantime.

        Show
        Shai Erera added a comment - I think that if I move those lines (in NativeFSLock.release()): if (!path.delete()) throw new LockReleaseFailedException( "failed to delete " + path); to outside the if(lockExists()) section, this should work? Because then the new NativeFSLock will attempt to release an lock that's held by someone else, and fail. If the lock exists for some reason, but nobody is holding it, that line should succeed. In order to test it, I think I'll need to spawn two processes, which is trickier. Let me know what you think about the fix in the meantime.
        Hide
        Shai Erera added a comment -

        Patch includes a test in TestLockFactory and fix in NativeFSLock.

        Show
        Shai Erera added a comment - Patch includes a test in TestLockFactory and fix in NativeFSLock.
        Hide
        Uwe Schindler added a comment - - edited

        In general, there is no problem with NativeFSLock at all, because the removal of the lock file is just a "cleanup", which is not needed (not deleting the lock file would produce no problem - we only delete it to be interoperable with SimpleFSLockFactory). The existence of a lock file is no indice for a locked directory, only if a lock is "on the file". The JVM normally automatically removes lock files on exit.

        Removing the lock file to "unlock" is also a no-op, because even if the lock file is removed, the lock still exists (unix delete on last close semantics). The unlock method in IndexWriter is just for SimpleFSLockFactory and is of no use for any other lock factory.

        Show
        Uwe Schindler added a comment - - edited In general, there is no problem with NativeFSLock at all, because the removal of the lock file is just a "cleanup", which is not needed (not deleting the lock file would produce no problem - we only delete it to be interoperable with SimpleFSLockFactory). The existence of a lock file is no indice for a locked directory, only if a lock is "on the file". The JVM normally automatically removes lock files on exit. Removing the lock file to "unlock" is also a no-op, because even if the lock file is removed, the lock still exists (unix delete on last close semantics). The unlock method in IndexWriter is just for SimpleFSLockFactory and is of no use for any other lock factory.
        Hide
        Shai Erera added a comment -

        Uwe, the problem is that unlock does nothing when NativeFSLock is used. So I open an IndexWriter and receive a LockObtainFailedException, and then call unlock because I'm sure no one else holds the lock, I expect to be able to open the writer after unlock returns. With NativeFSLock, I cannot fail to obtain the lock if it's there but nobody uses it, so if I hit the exception, I shouldn't be allowed to call unlock.

        If you use SimpleFSLockFactory and call IndexWriter.unlock(), but SimpleFSLock.release() fails, you hit an exception. With NativeFSLock it's not the same, simply because NativeFSLockFactory.makeLock creates a new instance with a 'lock = null', and therefore calling release() is always a no-op.

        I think the symmetry is broken and that's what the issue and patch attempt to fix.

        Show
        Shai Erera added a comment - Uwe, the problem is that unlock does nothing when NativeFSLock is used. So I open an IndexWriter and receive a LockObtainFailedException, and then call unlock because I'm sure no one else holds the lock, I expect to be able to open the writer after unlock returns. With NativeFSLock, I cannot fail to obtain the lock if it's there but nobody uses it, so if I hit the exception, I shouldn't be allowed to call unlock. If you use SimpleFSLockFactory and call IndexWriter.unlock(), but SimpleFSLock.release() fails, you hit an exception. With NativeFSLock it's not the same, simply because NativeFSLockFactory.makeLock creates a new instance with a 'lock = null', and therefore calling release() is always a no-op. I think the symmetry is broken and that's what the issue and patch attempt to fix.
        Hide
        Uwe Schindler added a comment -

        The problem with NativeFSLock is that you cannot release the lock if you are not owning the lock. It is the same like with synchronized() blocks, you can only unlock, if you own the lock.

        If you use NativeFSLock you should never-ever try to forcefully unlock! If there is a lock, then the dir is used and it would be the badest thing you could do, to forcefully unlock. The O/S will take care of lock removal on JVM exit.

        I just repeat: IndexWriter.unlock() is only a last resort to forcefully remove a lock file from SimpleFSLockFactory because the cleanup does not work. And I want to repeat: Removing the lock file does not remove the lock. You test e.g. fails on windows even with you patch, because Windows forcefully keeps the lock and does not allow to remove the lock file until it is explicitely unlocked.

        What you patch does on UNIX is, that it removes the lock file and therefore all references in filesystem to the lock. If another index writer then tries to lock again, it works, because the new file has another inode number and is a different lock file. What you are tryng to do is completely broken.

        Show
        Uwe Schindler added a comment - The problem with NativeFSLock is that you cannot release the lock if you are not owning the lock. It is the same like with synchronized() blocks, you can only unlock, if you own the lock. If you use NativeFSLock you should never-ever try to forcefully unlock! If there is a lock, then the dir is used and it would be the badest thing you could do, to forcefully unlock. The O/S will take care of lock removal on JVM exit. I just repeat: IndexWriter.unlock() is only a last resort to forcefully remove a lock file from SimpleFSLockFactory because the cleanup does not work. And I want to repeat: Removing the lock file does not remove the lock. You test e.g. fails on windows even with you patch, because Windows forcefully keeps the lock and does not allow to remove the lock file until it is explicitely unlocked. What you patch does on UNIX is, that it removes the lock file and therefore all references in filesystem to the lock. If another index writer then tries to lock again, it works, because the new file has another inode number and is a different lock file. What you are tryng to do is completely broken.
        Hide
        Uwe Schindler added a comment -

        The fix would be to add a javadocs comment to IW.unlock, that this method only works and is a hack just for SimpleFSLockFactory. The existence of a lock file in native fs lock is not the presence of a lock. The file is just a helper, to have a lock "namespace". Even if the file exists, the lock may not be obtained.

        Show
        Uwe Schindler added a comment - The fix would be to add a javadocs comment to IW.unlock, that this method only works and is a hack just for SimpleFSLockFactory. The existence of a lock file in native fs lock is not the presence of a lock. The file is just a helper, to have a lock "namespace". Even if the file exists, the lock may not be obtained.
        Hide
        Shai Erera added a comment -

        Thanks for the info Uwe. I know the unlock is to be used as a last resort, and if you know no one else has a reference to the lock. The fact that on Windows this fails is perfect, that's exactly what I want - for unlock() to fail if someone keeps a reference to the lock. On Unix - I didn't realize that's what happens, so that's indeed broken.

        I'm not sure that the fix is just javadocs ... perhaps the fix should be to add an unlock() method to Lock and impl it in SimpleFSLock by calling release(), but on NativeFSLock to first obtain the lock and if that succeeds, release it. That way, the obtain would fail and we can throw an exception. Also, we can keep release() and impl in NativeFSLock to first obtain if it does not hold the lock.

        Also, what you say about IndexWriter.unlock() should be used if SimpleFSLockFactory is used, and only then, is not: (1) documented anywhere and (2) reasonable. What if I implement a Lock over a DB (I think someone even posted something about that a while ago). I should still be able to call unlock().

        The thing is, IMO unlock() should throw an exception if it failed, and currently it does so in SimpleFSLock but not for NativeFSLock. The symmetry is broken, and I see no reason why it shouldn't fail for NativeFSLock, so the application knows about it. Notice that NativeFSLock fails to thrown an exception only because makeLock creates a new instance with the 'lock' member being null. It gives a false impression that the unlock succeeded, and for the wrong reason.

        Show
        Shai Erera added a comment - Thanks for the info Uwe. I know the unlock is to be used as a last resort, and if you know no one else has a reference to the lock. The fact that on Windows this fails is perfect, that's exactly what I want - for unlock() to fail if someone keeps a reference to the lock. On Unix - I didn't realize that's what happens, so that's indeed broken. I'm not sure that the fix is just javadocs ... perhaps the fix should be to add an unlock() method to Lock and impl it in SimpleFSLock by calling release(), but on NativeFSLock to first obtain the lock and if that succeeds, release it. That way, the obtain would fail and we can throw an exception. Also, we can keep release() and impl in NativeFSLock to first obtain if it does not hold the lock. Also, what you say about IndexWriter.unlock() should be used if SimpleFSLockFactory is used, and only then, is not: (1) documented anywhere and (2) reasonable. What if I implement a Lock over a DB (I think someone even posted something about that a while ago). I should still be able to call unlock(). The thing is, IMO unlock() should throw an exception if it failed, and currently it does so in SimpleFSLock but not for NativeFSLock. The symmetry is broken, and I see no reason why it shouldn't fail for NativeFSLock, so the application knows about it. Notice that NativeFSLock fails to thrown an exception only because makeLock creates a new instance with the 'lock' member being null. It gives a false impression that the unlock succeeded, and for the wrong reason.
        Hide
        Michael McCandless added a comment -

        Seems like two things are being discussed here:

        • Should IW.unlock be able to forcefully release a still-in-use
          lock? For SimpleFSLockFactory we have no choice but to allow
          this, since it's unable to tell if the lock is "really" still
          held. But since NativeFSLockFactory can tell, I don't think we
          should remove a still-in-use lock? Seems like we leave the choice
          to the LF, eg for someone's external LF, it should decide if
          forceful removal is allowed.
        • If forceful removal is not allowed, should we throw an exception
          noting that you failed to forcefully remove the lock? Seems like
          we should? Ie, somehow, NativeFSLockFactory should try to acquire
          the lock, and if it was already locked, should throw an exception
          saying you cannot release it?
        Show
        Michael McCandless added a comment - Seems like two things are being discussed here: Should IW.unlock be able to forcefully release a still-in-use lock? For SimpleFSLockFactory we have no choice but to allow this, since it's unable to tell if the lock is "really" still held. But since NativeFSLockFactory can tell, I don't think we should remove a still-in-use lock? Seems like we leave the choice to the LF, eg for someone's external LF, it should decide if forceful removal is allowed. If forceful removal is not allowed, should we throw an exception noting that you failed to forcefully remove the lock? Seems like we should? Ie, somehow, NativeFSLockFactory should try to acquire the lock, and if it was already locked, should throw an exception saying you cannot release it?
        Hide
        Uwe Schindler added a comment -

        +1 on exception.

        The problem: NativeFSLockFactory is never possible to get the lock and release it. This is only allowed for the code, that holds the lock (its the same like with a sysnchonized mutex using j.l.concurrent.ReentrantLock, only code that holds the lock can free it). So it is impossible to remove a lock, held by another process but it may be possible to release if it is created in the same JVM instance (for this to work, the factory needs a map of all generated locks).

        Show
        Uwe Schindler added a comment - +1 on exception. The problem: NativeFSLockFactory is never possible to get the lock and release it. This is only allowed for the code, that holds the lock (its the same like with a sysnchonized mutex using j.l.concurrent.ReentrantLock, only code that holds the lock can free it). So it is impossible to remove a lock, held by another process but it may be possible to release if it is created in the same JVM instance (for this to work, the factory needs a map of all generated locks).
        Hide
        Shai Erera added a comment -

        Thanks for the comments. I proposed this above:

        perhaps the fix should be to add an unlock() method to Lock and impl it in SimpleFSLock by calling release(), but on NativeFSLock to first obtain the lock and if that succeeds, release it. That way, the obtain would fail and we can throw an exception. Also, we can keep release() and impl in NativeFSLock to first obtain if it does not hold the lock.

        I vote for the second option (i.e. not adding another API, but use what's there and include a "first obtain then release" logic in an 'else' part in NativeFSLock).

        Show
        Shai Erera added a comment - Thanks for the comments. I proposed this above: perhaps the fix should be to add an unlock() method to Lock and impl it in SimpleFSLock by calling release(), but on NativeFSLock to first obtain the lock and if that succeeds, release it. That way, the obtain would fail and we can throw an exception. Also, we can keep release() and impl in NativeFSLock to first obtain if it does not hold the lock. I vote for the second option (i.e. not adding another API, but use what's there and include a "first obtain then release" logic in an 'else' part in NativeFSLock).
        Hide
        Shai Erera added a comment -

        Changed NativeFSLock.release() to first obtain and then release the lock, if it's not held. That way, if obtain fails, LockReleaseFailedException is thrown, to indicate that.

        Show
        Shai Erera added a comment - Changed NativeFSLock.release() to first obtain and then release the lock, if it's not held. That way, if obtain fails, LockReleaseFailedException is thrown, to indicate that.
        Hide
        Shai Erera added a comment -

        How do we proceed with this? Can someone (Uwe/Mike?) plz review the latest patch I added?

        Show
        Shai Erera added a comment - How do we proceed with this? Can someone (Uwe/Mike?) plz review the latest patch I added?
        Hide
        Uwe Schindler added a comment -

        For me the latest patch looks good as it no longer enforces to unlock (which is not possible with NativeFSLock) and throws Exception. The message should not include other processs, as the test verifies that it also fails in the same process, if the file is locked. Something like "Cannot forcefully unlock a NativeFSLock which is held by another indexer component.".

        Show
        Uwe Schindler added a comment - For me the latest patch looks good as it no longer enforces to unlock (which is not possible with NativeFSLock) and throws Exception. The message should not include other processs, as the test verifies that it also fails in the same process, if the file is locked. Something like "Cannot forcefully unlock a NativeFSLock which is held by another indexer component.".
        Hide
        Shai Erera added a comment -

        Changed the sentence and updated CHANGES.

        Show
        Shai Erera added a comment - Changed the sentence and updated CHANGES.
        Hide
        Michael McCandless added a comment -

        Patch looks good! Thanks Shai.

        Show
        Michael McCandless added a comment - Patch looks good! Thanks Shai.
        Hide
        Uwe Schindler added a comment -

        Committed revision 891205 with the "== false" check changed to "!"

        Thanks Shai!

        Show
        Uwe Schindler added a comment - Committed revision 891205 with the "== false" check changed to "!" Thanks Shai!
        Hide
        Uwe Schindler added a comment -

        Should I backport?

        Show
        Uwe Schindler added a comment - Should I backport?
        Hide
        Michael McCandless added a comment -

        Should I backport?

        No – it was already fixed on 3x (I just marked it as such in the issue), probably because we branched 3x off after this was committed.

        Show
        Michael McCandless added a comment - Should I backport? No – it was already fixed on 3x (I just marked it as such in the issue), probably because we branched 3x off after this was committed.
        Hide
        Michael McCandless added a comment -

        backport

        Show
        Michael McCandless added a comment - backport

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development