Lucene - Core
  1. Lucene - Core
  2. LUCENE-635

[PATCH] Decouple locking implementation from Directory implementation

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.0.0
    • Fix Version/s: 2.1
    • Component/s: core/index
    • Labels:
      None

      Description

      This is a spinoff of http://issues.apache.org/jira/browse/LUCENE-305.

      I've opened this new issue to capture that it's wider scope than
      LUCENE-305.

      This is a patch originally created by Jeff Patterson (see above link)
      and then modified as described here:

      http://issues.apache.org/jira/browse/LUCENE-305#action_12418493

      with some small additional changes:

      • For each FSDirectory.getDirectory(), I made a corresponding
        version that also accepts a LockFactory instance. So, you can
        construct an FSDirectory with your own LockFactory.
      • Cascaded defaulting for FSDirectory's LockFactory implementation:
        if you pass in a LockFactory instance, it's used; else if
        setDisableLocks was called, we use NoLockFactory; else, if the
        system property "org.apache.lucene.store.FSDirectoryLockFactoryClass"
        is defined, we use that; finally, we'll use the original locking
        implementation (SimpleFSLockFactory).

      The gist is that all locking code has been moved out of *Directory and
      into subclasses of a new abstract LockFactory class. You can now set
      the LockFactory of a Directory to change how it does locking. For
      example, you can create an FSDirectory but set its locking to
      SingleInstanceLockFactory (if you know all writing/reading will take
      place a single JVM).

      The changes pass all unit tests (on Ubuntu Linux Sun Java 1.5 and
      Windows XP Sun Java 1.4), and I added another TestCase to test the
      LockFactory code.

      Note that LockFactory defaults are not changed: FSDirectory defaults
      to SimpleFSLockFactory and RAMDirectory defaults to
      SingleInstanceLockFactory.

      Next step (separate issue) is to create a LockFactory that uses the OS
      native locks (through java.nio).

      1. patch-Jul26.tar
        90 kB
        Michael McCandless
      2. LUCENE-635-Aug3.patch
        43 kB
        Michael McCandless
      3. LUCENE-635-Aug27.patch
        43 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          Doron Cohen added a comment -

          > We could (as you're suggesting) indeed extend FSDirectory so that it
          > provided the low level methods required by a locking implementation,
          > and then alter SimpleFSLockFactory/NativeFSLockFactory (or make a new
          > LockFactory) so that all underlying IO is through the FSDirectory instead.

          Yes, this is exactly (and only) what I am suggesting to consider - to include a Directory member within the LockFactory so that it is clear that any LockFactory implementation operates in the realm of a directory (implementation) and is using it for any actual store accesses.

          Show
          Doron Cohen added a comment - > We could (as you're suggesting) indeed extend FSDirectory so that it > provided the low level methods required by a locking implementation, > and then alter SimpleFSLockFactory/NativeFSLockFactory (or make a new > LockFactory) so that all underlying IO is through the FSDirectory instead. Yes, this is exactly (and only) what I am suggesting to consider - to include a Directory member within the LockFactory so that it is clear that any LockFactory implementation operates in the realm of a directory (implementation) and is using it for any actual store accesses.
          Hide
          Michael McCandless added a comment -

          With this change, "Directory on DB", "Directory on RAM", etc., still
          work correctly. In fact you can completely override the LockFactory
          behavior by implementing your own "makeLock" in a subclass of
          Directory if you want to.

          This change just opens up the freedom to allow you to separately
          choose how your locking is done. I think this is important because
          many applications have different locking requirements. Perhaps you
          require no locking at all (NoLockFactory or legacy
          FSDirectory.setDisabledLocks), or everything happens in one JVM
          (SingleInstanceLockFactory), etc.

          This also opens up the chance for people to work around locking issues
          eg over NFS until we can get lock-less commits finished.

          I'm working on a LockFactory implementation that uses native OS locks
          (java.nio.*) and this will be another place that accesses the file
          system. The java.io.File.createNewFile (used by the
          SimpleFSLockFactory) has a very spooky warning about not using it for
          locking.

          We could (as you're suggesting) indeed extend FSDirectory so that it
          provided the low level methods required by a locking implementation,
          and then alter SimpleFSLockFactory/NativeFSLockFactory (or make a new
          LockFactory) so that all underlying IO is through the FSDirectory
          instead.

          Show
          Michael McCandless added a comment - With this change, "Directory on DB", "Directory on RAM", etc., still work correctly. In fact you can completely override the LockFactory behavior by implementing your own "makeLock" in a subclass of Directory if you want to. This change just opens up the freedom to allow you to separately choose how your locking is done. I think this is important because many applications have different locking requirements. Perhaps you require no locking at all (NoLockFactory or legacy FSDirectory.setDisabledLocks), or everything happens in one JVM (SingleInstanceLockFactory), etc. This also opens up the chance for people to work around locking issues eg over NFS until we can get lock-less commits finished. I'm working on a LockFactory implementation that uses native OS locks (java.nio.*) and this will be another place that accesses the file system. The java.io.File.createNewFile (used by the SimpleFSLockFactory) has a very spooky warning about not using it for locking. We could (as you're suggesting) indeed extend FSDirectory so that it provided the low level methods required by a locking implementation, and then alter SimpleFSLockFactory/NativeFSLockFactory (or make a new LockFactory) so that all underlying IO is through the FSDirectory instead.
          Hide
          Doron Cohen added a comment -

          While updating my patch for 665 according the changes here, I noticed something - I may be wrong here - but it seems to me that until this change, all the actual FS access operations where performed by FSDirectory, using the Directory API.

          The new SimpleFSLock and SimpleFSLockFactory also access the FS directly, not through FSDirectory API.

          That Directory abstraction in Lucene allows to develop Lucene-in-RAM, Lucene-in-DB, etc. It is a nice feature.

          Guess we can say: "well, now the abstraction is made of two interfaces - Lock and Directory, just make sure you use 'matching' implementations of them." This seems weaker than before.

          Or, can limit all file access to go through FSDirectory -

          • one possibility is to add to LockFactory a Directory object (as a class member); SimpleFSLockFactory can require thas Directory object to be FSDirectory (cast, and fail otherwise); also, FSDirectory should be extened with createSingleFile(), mkdirs() and isDirectory().
          Show
          Doron Cohen added a comment - While updating my patch for 665 according the changes here, I noticed something - I may be wrong here - but it seems to me that until this change, all the actual FS access operations where performed by FSDirectory, using the Directory API. The new SimpleFSLock and SimpleFSLockFactory also access the FS directly, not through FSDirectory API. That Directory abstraction in Lucene allows to develop Lucene-in-RAM, Lucene-in-DB, etc. It is a nice feature. Guess we can say: "well, now the abstraction is made of two interfaces - Lock and Directory, just make sure you use 'matching' implementations of them." This seems weaker than before. Or, can limit all file access to go through FSDirectory - one possibility is to add to LockFactory a Directory object (as a class member); SimpleFSLockFactory can require thas Directory object to be FSDirectory (cast, and fail otherwise); also, FSDirectory should be extened with createSingleFile(), mkdirs() and isDirectory().
          Hide
          Yonik Seeley added a comment -

          Committed. Thanks Michael!

          Show
          Yonik Seeley added a comment - Committed. Thanks Michael!
          Hide
          Michael McCandless added a comment -

          OK, I agree. I've updated the CHANGES.txt to state this small change.

          And I've fixed SimpleFSLockFactory to move directory existence checking & creation back into the obtain() method.

          New patch attached!

          Show
          Michael McCandless added a comment - OK, I agree. I've updated the CHANGES.txt to state this small change. And I've fixed SimpleFSLockFactory to move directory existence checking & creation back into the obtain() method. New patch attached!
          Hide
          Yonik Seeley added a comment -

          > I would lean towards keeping the small change to "setDisabledLocks()".
          > Meaning, it's only when you create a FSDirectory that the static
          > "disableLocks" value is checked.

          I think this is probably OK. In addition to being a little-used method, If one truely wanted locking disabled (for read-only media for example) they would be calling setDisableLocks() before opening an IndexReader anyway.

          Show
          Yonik Seeley added a comment - > I would lean towards keeping the small change to "setDisabledLocks()". > Meaning, it's only when you create a FSDirectory that the static > "disableLocks" value is checked. I think this is probably OK. In addition to being a little-used method, If one truely wanted locking disabled (for read-only media for example) they would be calling setDisableLocks() before opening an IndexReader anyway.
          Hide
          Michael McCandless added a comment -

          OK, does anyone have a strong opinion one way or another on these
          small changes?

          I would lean towards keeping the small change to "setDisabledLocks()".
          Meaning, it's only when you create a FSDirectory that the static
          "disableLocks" value is checked. So, changing disabledLocks would no
          longer retroactively affect all previously created FSDirectories,
          which seems too "powerful" – what if I wanted some to be disabled and
          others not? Was it intentional that it was this powerful? If we do
          this we could document it in CHANGES.txt as a small difference. Or,
          again, I can put back the old behaviour if people think that's best.

          On the second one, I agree we should keep the current behaviour of
          checking existence of & creating the LOCK DIR with each obtain. There
          would be some performance benefit to only doing it on creating the
          lock factory, but, I don't think that's worth the risk of the change.
          So I'll go ahead & fix that one.

          Show
          Michael McCandless added a comment - OK, does anyone have a strong opinion one way or another on these small changes? I would lean towards keeping the small change to "setDisabledLocks()". Meaning, it's only when you create a FSDirectory that the static "disableLocks" value is checked. So, changing disabledLocks would no longer retroactively affect all previously created FSDirectories, which seems too "powerful" – what if I wanted some to be disabled and others not? Was it intentional that it was this powerful? If we do this we could document it in CHANGES.txt as a small difference. Or, again, I can put back the old behaviour if people think that's best. On the second one, I agree we should keep the current behaviour of checking existence of & creating the LOCK DIR with each obtain. There would be some performance benefit to only doing it on creating the lock factory, but, I don't think that's worth the risk of the change. So I'll go ahead & fix that one.
          Hide
          Yonik Seeley added a comment -

          Yeah... those were the slight differences in external behavior I saw.
          That doesn't mean it's wrong, but it does mean we should examine if it's OK to change it (or just defer the changes to a later patch...).

          Show
          Yonik Seeley added a comment - Yeah... those were the slight differences in external behavior I saw. That doesn't mean it's wrong, but it does mean we should examine if it's OK to change it (or just defer the changes to a later patch...).
          Hide
          Michael McCandless added a comment -

          Thank you! I agree, locking is sneaky and requires very thorough
          review & testing.

          Nice, I definitely like that more compact version of
          SingleInstanceLockFactory.obtain – I'll fold that in.

          On FSDirectory.disableLocks, which is a private static boolean set by
          "setDisabledLocks", if this is "true" when the FSDirectory is created
          then FSDirectory uses the NoLockFactory for its locking; else it uses
          the default SimpleFSLockFactory. (This is only when the caller did
          not provide a LockFactory instance).

          OOH I do see one difference: in the current code, if you call
          setDisableLocks then this affects even a previously created
          FSDirectory, with the current code. But with my changes, only newly
          created FSDirectory instances will have locking disabled. Ie, it's no
          longer "retroactive" to all previously created FSDirectory instances,
          with my change. Hmm. OK I will fix this case.

          On SimpleFSLock.obtain, you are correct: I lost the creation of the
          lock dir (if it doesn't exist) with each obtain. Good catch! I
          didn't mean to lose it. I will put it back in, and move it out of the
          init() method in SimpleFSLockFactory.

          Thanks for reviewing this!

          Show
          Michael McCandless added a comment - Thank you! I agree, locking is sneaky and requires very thorough review & testing. Nice, I definitely like that more compact version of SingleInstanceLockFactory.obtain – I'll fold that in. On FSDirectory.disableLocks, which is a private static boolean set by "setDisabledLocks", if this is "true" when the FSDirectory is created then FSDirectory uses the NoLockFactory for its locking; else it uses the default SimpleFSLockFactory. (This is only when the caller did not provide a LockFactory instance). OOH I do see one difference: in the current code, if you call setDisableLocks then this affects even a previously created FSDirectory, with the current code. But with my changes, only newly created FSDirectory instances will have locking disabled. Ie, it's no longer "retroactive" to all previously created FSDirectory instances, with my change. Hmm. OK I will fix this case. On SimpleFSLock.obtain, you are correct: I lost the creation of the lock dir (if it doesn't exist) with each obtain. Good catch! I didn't mean to lose it. I will put it back in, and move it out of the init() method in SimpleFSLockFactory. Thanks for reviewing this!
          Hide
          Yonik Seeley added a comment -

          Very nice job Michael... very thorough.
          In general, locking & synchronization is something that requires hard review since it's hard to test for correctness, but the thouroughness of your tests increases my confidence.

          Super-minor improvement while I'm looking at it: could the following
          method body be replaced with "synchronized(locks)

          {return locks.add(lockName);}

          " ?

          + public boolean obtain() throws IOException {
          + synchronized(locks) {
          + if (!locks.contains(lockName))

          { + locks.add(lockName); + return true; + }

          else

          { + return false; + }

          + }
          + }

          As far as backward compatibility, could you speak to
          1) FSDirectory.disableLocks
          2) Slight changes in how FSLock.obtain works (the old one recreated the lock dir each time)

          Show
          Yonik Seeley added a comment - Very nice job Michael... very thorough. In general, locking & synchronization is something that requires hard review since it's hard to test for correctness, but the thouroughness of your tests increases my confidence. Super-minor improvement while I'm looking at it: could the following method body be replaced with "synchronized(locks) {return locks.add(lockName);} " ? + public boolean obtain() throws IOException { + synchronized(locks) { + if (!locks.contains(lockName)) { + locks.add(lockName); + return true; + } else { + return false; + } + } + } As far as backward compatibility, could you speak to 1) FSDirectory.disableLocks 2) Slight changes in how FSLock.obtain works (the old one recreated the lock dir each time)
          Hide
          Michael McCandless added a comment -

          Awesome, thanks Otis! Have a great vacation!

          Show
          Michael McCandless added a comment - Awesome, thanks Otis! Have a great vacation!
          Hide
          Otis Gospodnetic added a comment -

          I took a look at it a few weeks back. If nobody takes care of it, I'll look at it again and hopefully commit it after I return from vacation in September.

          Show
          Otis Gospodnetic added a comment - I took a look at it a few weeks back. If nobody takes care of it, I'll look at it again and hopefully commit it after I return from vacation in September.
          Hide
          Michael McCandless added a comment -

          Has anyone had a chance to look at this patch?

          This should be fully backwards compatible: old APIs have not changed.
          I've just added new ones that allow you to set the locking
          implementation per Directory. The default Locking implementation also
          has not changed; it's just been refactored out of the *Directory.java
          sources. So this should be a drop-in change to existing users of
          Lucene.

          This change passes all unit tests, and I added a new test (with 9 test
          cases) for LockFactory.

          The above LUCENE-635-Aug3.patch still applies cleanly to the
          current svn HEAD (431322).

          Show
          Michael McCandless added a comment - Has anyone had a chance to look at this patch? This should be fully backwards compatible: old APIs have not changed. I've just added new ones that allow you to set the locking implementation per Directory. The default Locking implementation also has not changed; it's just been refactored out of the *Directory.java sources. So this should be a drop-in change to existing users of Lucene. This change passes all unit tests, and I added a new test (with 9 test cases) for LockFactory. The above LUCENE-635 -Aug3.patch still applies cleanly to the current svn HEAD (431322).
          Hide
          Michael McCandless added a comment -

          This patch contains the same source changes as my July 26 patch, but this one is done "correctly" as the output of a single top-level "svn diff" command (ie, I ran "svn add ..." locall for the new files). I also added an entry to CHANGES.txt, and corrected newlines on one of the sources.

          Show
          Michael McCandless added a comment - This patch contains the same source changes as my July 26 patch, but this one is done "correctly" as the output of a single top-level "svn diff" command (ie, I ran "svn add ..." locall for the new files). I also added an entry to CHANGES.txt, and corrected newlines on one of the sources.
          Hide
          Michael McCandless added a comment -

          TAR file containing sources as first cut at implementation. I've also included patch files off revision 425918.

          Show
          Michael McCandless added a comment - TAR file containing sources as first cut at implementation. I've also included patch files off revision 425918.

            People

            • Assignee:
              Yonik Seeley
              Reporter:
              Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development