Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-678

[PATCH] LockFactory implementation based on OS native locks (java.nio.*)

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.1
    • Fix Version/s: 2.1
    • Component/s: core/store
    • Labels:
      None
    • Lucene Fields:
      Patch Available

      Description

      The current default locking for FSDirectory is SimpleFSLockFactory.
      It uses java.io.File.createNewFile for its locking, which has this
      spooky warning in Sun's javadocs:

      Note: this method should not be used for file-locking, as the
      resulting protocol cannot be made to work reliably. The FileLock
      facility should be used instead.

      So, this patch provides a LockFactory implementation based on FileLock
      (using java.nio.*).

      All unit tests pass with this patch, on OS X (10.4.8), Linux (Ubuntu
      6.06), and Windows XP SP2.

      Another benefit of native locks is the OS automatically frees them if
      the JVM exits before Lucene can free its locks. Many people seem to
      hit this (old lock files still on disk) now.

      I've created this new class:

      org.apache.lucene.store.NativeFSLockFactory

      and added a couple test cases to the existing TestLockFactory.

      I've left SimpleFSLockFactory as the default locking for FSDirectory
      for now. I think we should get some usage / experience with
      NativeFSLockFactory and then later on make it the default locking
      implementation?

      I also tested changing FSDirectory's default locking to
      NativeFSLockFactory and all unit tests still pass (on the above
      platforms).

      One important note about locking over NFS: some NFS servers and/or
      clients do not support it, or, it's a configuration option or mode
      that must be explicitly enabled. When it's misconfigured it's able to
      take a long time (35 seconds in my case) before throwing an exception.
      To handle this, I acquire & release a random test lock on creating the
      NativeFSLockFactory to verify locking is configured properly.

      A few other small changes in the patch:

      • Added a "failure reason" to Lock.java so that in
        obtain(lockWaitTimeout), if there is a persistent IOException
        in trying to obtain the lock, this can be messaged & included in
        the "Lock obtain timed out" that's raised.
      • Corrected javadoc in SimpleFSLockFactory: it previously said the
        wrong system property for overriding lock class via system
        properties
      • Fixed unhandled IOException when opening an IndexWriter for
        create, if the locks dir does not exist (just added
        lockDir.exists() check in clearAllLocks method of
        SimpleFSLockFactory & NativeFSLockFactory.
      • Fixed a few small unrelated issues with TestLockFactory, and
        also fixed tests to accept NativeFSLockFactory as the default
        locking implementation for FSDirectory.
      • Fixed a typo in javadoc in FieldsReader.java
      • Added some more javadoc for the LockFactory.setLockPrefix
      1. LUCENE-678-patch.txt
        20 kB
        Michael McCandless
      2. LUCENE-678-patch2.txt
        16 kB
        Michael McCandless

        Activity

        Hide
        mikemccand Michael McCandless added a comment -

        Has anyone had a chance to look at this patch?

        Show
        mikemccand Michael McCandless added a comment - Has anyone had a chance to look at this patch?
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        Committed. Thanks Michael!

        Show
        yseeley@gmail.com Yonik Seeley added a comment - Committed. Thanks Michael!
        Hide
        doronc Doron Cohen added a comment -

        The patch added a call to "writer.close()" in TestLockFactory - testFSDirectoryTwoCreates().
        This is just before the 2nd attempt to create an index writer with override.
        This line should probably be removed, as it cancels the second part of that test case, right?

        Show
        doronc Doron Cohen added a comment - The patch added a call to "writer.close()" in TestLockFactory - testFSDirectoryTwoCreates(). This is just before the 2nd attempt to create an index writer with override. This line should probably be removed, as it cancels the second part of that test case, right?
        Hide
        mikemccand Michael McCandless added a comment -

        I believe it's correct with the line in there (ie, as committed)?

        That test case is verifying that the 2nd index writer indeed removes any leftover lockfiles created by the first one.

        It did not intend to test the case (but previously was) of opening an IndexWriter with create=true while another IndexWriter is already open against that same directory.

        Show
        mikemccand Michael McCandless added a comment - I believe it's correct with the line in there (ie, as committed)? That test case is verifying that the 2nd index writer indeed removes any leftover lockfiles created by the first one. It did not intend to test the case (but previously was) of opening an IndexWriter with create=true while another IndexWriter is already open against that same directory.
        Hide
        doronc Doron Cohen added a comment -

        Michael, I must be misunderstanding something then...

        > That test case is verifying that the 2nd index writer indeed removes
        > any leftover lockfiles created by the first one.

        Can there be any leftovers once the first writer was closed?

        > It did not intend to test the case (but previously was)..

        Could you explain why the change?

        Thanks,
        Doron

        Show
        doronc Doron Cohen added a comment - Michael, I must be misunderstanding something then... > That test case is verifying that the 2nd index writer indeed removes > any leftover lockfiles created by the first one. Can there be any leftovers once the first writer was closed? > It did not intend to test the case (but previously was).. Could you explain why the change? Thanks, Doron
        Hide
        mikemccand Michael McCandless added a comment -

        OK! Sorry, you are correct. Of course on closing the first
        IndexWriter cleanly, it will remove its lock files. Good catch.

        Now I remember (and just re-tested) why I added this line: as a test,
        I temporarily changed FSDirectory's default locking to
        NativeFSLockFactory and then ran all unit tests. This test case
        failed, with IOException on creating that second IndexWriter and my
        [wrong] fix was to close the first writer. But you're right that fix
        is not right because it turns off the test entirely.

        However, I still think the test is incorrect.

        Really what the test is trying to do is "simulate" a crashed writer
        and then verify creating a new writer on that directory works. With
        SimpleFSLockFactory, an effective "simulation" is to leave open the
        first IndexWriter. But with NativeFSLockFactory, it's not, because
        the OS won't let you remove the lock files (that are still held open)
        – this is actually a good thing because in this context it would let
        the programmer know that indeed another writer is still holding the
        write lock.

        Maybe we should remove the writer.close and then disable this test
        when we switch to Native locks as the default [in the future]? Doron
        what do you think?

        UGH, I just found a serious [different] problem with my patch. I'm
        doing too much "sharing" for the locks such that different directories
        will incorrectly share locks! Ugh.

        I'm reopening this so I can fix it properly. Sorry about this.

        Show
        mikemccand Michael McCandless added a comment - OK! Sorry, you are correct. Of course on closing the first IndexWriter cleanly, it will remove its lock files. Good catch. Now I remember (and just re-tested) why I added this line: as a test, I temporarily changed FSDirectory's default locking to NativeFSLockFactory and then ran all unit tests. This test case failed, with IOException on creating that second IndexWriter and my [wrong] fix was to close the first writer. But you're right that fix is not right because it turns off the test entirely. However, I still think the test is incorrect. Really what the test is trying to do is "simulate" a crashed writer and then verify creating a new writer on that directory works. With SimpleFSLockFactory, an effective "simulation" is to leave open the first IndexWriter. But with NativeFSLockFactory, it's not, because the OS won't let you remove the lock files (that are still held open) – this is actually a good thing because in this context it would let the programmer know that indeed another writer is still holding the write lock. Maybe we should remove the writer.close and then disable this test when we switch to Native locks as the default [in the future] ? Doron what do you think? UGH, I just found a serious [different] problem with my patch. I'm doing too much "sharing" for the locks such that different directories will incorrectly share locks! Ugh. I'm reopening this so I can fix it properly. Sorry about this.
        Hide
        mikemccand Michael McCandless added a comment -

        OK, take two.

        First off, I added a new test case to TestLockFactory to catch
        "improper lock / lock ID sharing across different directories". This
        test case fails with the currently committed version of
        NativeFSLockFactory (ie, catches the bug I had sadly created).

        Then, I fixed the bug in NativeFSLockFactory, and now that new test
        case, and all existing unit tests, pass.

        OK changes I made in this take:

        • Doron, I removed the writer.close() that I had added previously,
          and then put a comment there summarizing the details above.
        • I reworked the class, to fix how I enforce that only one Channel
          is used per lock file: I now use a HashSet (in the NativeFSLock
          class) to keep track of all currently held locks in the JVM. A
          Channel is only held open while the lock is held. Also put back
          constructors for the class instead of the "getLockFactory(...)"
          static methods.
        • Updated the javadocs a bit.
        • Added unit test cases (described above).
        • Fixed one other test case for NativeFSLockFactory that was testing
          for identical LockFactories, Locks, etc., which is no longer a
          correct test (since these instances are now different).
        Show
        mikemccand Michael McCandless added a comment - OK, take two. First off, I added a new test case to TestLockFactory to catch "improper lock / lock ID sharing across different directories". This test case fails with the currently committed version of NativeFSLockFactory (ie, catches the bug I had sadly created). Then, I fixed the bug in NativeFSLockFactory, and now that new test case, and all existing unit tests, pass. OK changes I made in this take: Doron, I removed the writer.close() that I had added previously, and then put a comment there summarizing the details above. I reworked the class, to fix how I enforce that only one Channel is used per lock file: I now use a HashSet (in the NativeFSLock class) to keep track of all currently held locks in the JVM. A Channel is only held open while the lock is held. Also put back constructors for the class instead of the "getLockFactory(...)" static methods. Updated the javadocs a bit. Added unit test cases (described above). Fixed one other test case for NativeFSLockFactory that was testing for identical LockFactories, Locks, etc., which is no longer a correct test (since these instances are now different).
        Hide
        mikemccand Michael McCandless added a comment -

        Also, I checked the new "Patch Available" field!

        Show
        mikemccand Michael McCandless added a comment - Also, I checked the new "Patch Available" field!
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        take two committed. Thanks for keeping on top of this Michael!

        Show
        yseeley@gmail.com Yonik Seeley added a comment - take two committed. Thanks for keeping on top of this Michael!

          People

          • Assignee:
            mikemccand Michael McCandless
            Reporter:
            mikemccand Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development