Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1011

Two or more writers over NFS can cause index corruption

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.9, 2.0.0, 2.1, 2.2, 2.3, 2.4, 2.9
    • Fix Version/s: 2.3
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      When an index is used over NFS, and, more than one machine can be a
      writer such that they swap roles quickly, it's possible for the index
      to become corrupt if the NFS client directory cache is stale.

      Not all NFS clients will show this. Very recent versions of Linux's
      NFS client do not seem to show the issue, yet, slightly older ones do,
      and the latest Mac OS X one does as well.

      I've been working with Patrick Kimber, who provided a standalone test
      showing the problem (thank you Patrick!). This came out of this
      thread:

      http://www.gossamer-threads.com/lists/engine?do=post_view_flat;post=50680;page=1;sb=post_latest_reply;so=ASC;mh=25;list=lucene

      Note that the first issue in that discussion has been resolved
      (LUCENE-948). This is a new issue.

      1. LUCENE-1011.patch
        31 kB
        Michael McCandless

        Activity

        Hide
        mikemccand Michael McCandless added a comment -

        I just committed this. Thanks Patrick!

        Show
        mikemccand Michael McCandless added a comment - I just committed this. Thanks Patrick!
        Hide
        mikemccand Michael McCandless added a comment -

        > i'm not an expert on file Locking (either in Lucene, or in the JVM,
        > or any OSes) but i have to wonder if the problems you are seeing are
        > inherent in the Java FileLock APIs, or if they only manifest in
        > specific implementations (ie: certain JVM impls, certain
        > filesystems, certain combinations of NFS client/server, etc...)

        I'm no expert either, and I continue to be rather shocked each time I
        learn more!

        > if we can say "NativeFSLockFactory uses the Java FileLock API to
        > provide locking. FileLock known to be buggy in the following
        > situations: .... " then we've done all we can do, correct?

        I agree, I think this is exactly what we should do. I'll update the
        javadoc for NativeFSLockFactory with this statement.

        Show
        mikemccand Michael McCandless added a comment - > i'm not an expert on file Locking (either in Lucene, or in the JVM, > or any OSes) but i have to wonder if the problems you are seeing are > inherent in the Java FileLock APIs, or if they only manifest in > specific implementations (ie: certain JVM impls, certain > filesystems, certain combinations of NFS client/server, etc...) I'm no expert either, and I continue to be rather shocked each time I learn more! > if we can say "NativeFSLockFactory uses the Java FileLock API to > provide locking. FileLock known to be buggy in the following > situations: .... " then we've done all we can do, correct? I agree, I think this is exactly what we should do. I'll update the javadoc for NativeFSLockFactory with this statement.
        Hide
        hossman Hoss Man added a comment -

        : In testing in my NFS area (mix of Linux & OS X), I see
        : NativeFSLockFactory sometimes (rarely) allowing a lock to be
        : double-acquired. Whereas after stress testing SimpleFSLockFactory for
        : a looong time, it never allows that.
        :
        : So the NFS challenge/saga continues: now, you should in fact use
        : SimpleFSLockFactory, and work around the fact that you will sometimes
        : have to manually remove lock files (it is the lesser of evils).

        i'm not an expert on file Locking (either in Lucene, or in the JVM, or any OSes) but i have to wonder if the problems you are seeing are inherent in the Java FileLock APIs, or if they only manifest in specific implementations (ie: certain JVM impls, certain filesystems, certain combinations of NFS client/server, etc...)

        if we can say "NativeFSLockFactory uses the Java FileLock API to provide locking. FileLock known to be buggy in the following situations: .... " then we've done all we can do, correct?

        Show
        hossman Hoss Man added a comment - : In testing in my NFS area (mix of Linux & OS X), I see : NativeFSLockFactory sometimes (rarely) allowing a lock to be : double-acquired. Whereas after stress testing SimpleFSLockFactory for : a looong time, it never allows that. : : So the NFS challenge/saga continues: now, you should in fact use : SimpleFSLockFactory, and work around the fact that you will sometimes : have to manually remove lock files (it is the lesser of evils). i'm not an expert on file Locking (either in Lucene, or in the JVM, or any OSes) but i have to wonder if the problems you are seeing are inherent in the Java FileLock APIs, or if they only manifest in specific implementations (ie: certain JVM impls, certain filesystems, certain combinations of NFS client/server, etc...) if we can say "NativeFSLockFactory uses the Java FileLock API to provide locking. FileLock known to be buggy in the following situations: .... " then we've done all we can do, correct?
        Hide
        mikemccand Michael McCandless added a comment -

        Once I got through the locking issue (switched Patrick's test to use
        SimpleFSLockFactory), I could no longer reproduce his issue, but he
        could in his environment. So I worked out a simple change to how the
        segments_N file is located: instead of first trying the directory
        listing and then second falling back to reading segments.gen, always
        try both and then use whichever generation is larger. This way we can
        tolerate a stale directory cache, or a stale file contents cache
        (though not both at the same time). In Patrick's testing this looks
        to have resolved the issue.

        I also fixed IndexFileDeleter to try specifically loading the current
        commit point if this point was not seen in the directory listing
        (which would happen if directory listing cache was stale), and,
        improved messaging in IndexWriter (when you call setInfoStream(...))
        to print more details about the configuration of the writer, to aid in
        future remote debugging.

        Show
        mikemccand Michael McCandless added a comment - Once I got through the locking issue (switched Patrick's test to use SimpleFSLockFactory), I could no longer reproduce his issue, but he could in his environment. So I worked out a simple change to how the segments_N file is located: instead of first trying the directory listing and then second falling back to reading segments.gen, always try both and then use whichever generation is larger. This way we can tolerate a stale directory cache, or a stale file contents cache (though not both at the same time). In Patrick's testing this looks to have resolved the issue. I also fixed IndexFileDeleter to try specifically loading the current commit point if this point was not seen in the directory listing (which would happen if directory listing cache was stale), and, improved messaging in IndexWriter (when you call setInfoStream(...)) to print more details about the configuration of the writer, to aid in future remote debugging.
        Hide
        mikemccand Michael McCandless added a comment -

        Using the lock verifier above, I discovered something shocking (to
        me): NativeFSLockFactory is in general NOT RELIABLE for locking over
        NFS, while SimpleFSLockFactory is reliable modulo the "fails to delete
        on exit/crash" minor issue.

        This is unexpected because the whole reason we originally created
        NativeFSLockFactory was to improve locking over "challenging"
        filesystems like NFS. The spooky comment in Sun's javadocs on using
        File.createNewFile for locking (which is what SimpleFSLockFactory
        uses) drove this:

        http://java.sun.com/j2se/1.4.2/docs/api/java/io/File.html#createNewFile()

        But then I remembered Marvin's comment about this:

        http://issues.apache.org/jira/browse/LUCENE-710#action_12466911

        And on following that lead, indeed, that comment "Note: this method
        should not be used for file-locking, as the resulting protocol cannot
        be made to work reliably" is only referring to the fact that you
        cannot reliably guarantee this lock file will be properly removed.

        In testing in my NFS area (mix of Linux & OS X), I see
        NativeFSLockFactory sometimes (rarely) allowing a lock to be
        double-acquired. Whereas after stress testing SimpleFSLockFactory for
        a looong time, it never allows that.

        So the NFS challenge/saga continues: now, you should in fact use
        SimpleFSLockFactory, and work around the fact that you will sometimes
        have to manually remove lock files (it is the lesser of evils).

        Show
        mikemccand Michael McCandless added a comment - Using the lock verifier above, I discovered something shocking (to me): NativeFSLockFactory is in general NOT RELIABLE for locking over NFS, while SimpleFSLockFactory is reliable modulo the "fails to delete on exit/crash" minor issue. This is unexpected because the whole reason we originally created NativeFSLockFactory was to improve locking over "challenging" filesystems like NFS. The spooky comment in Sun's javadocs on using File.createNewFile for locking (which is what SimpleFSLockFactory uses) drove this: http://java.sun.com/j2se/1.4.2/docs/api/java/io/File.html#createNewFile( ) But then I remembered Marvin's comment about this: http://issues.apache.org/jira/browse/LUCENE-710#action_12466911 And on following that lead, indeed, that comment "Note: this method should not be used for file-locking, as the resulting protocol cannot be made to work reliably" is only referring to the fact that you cannot reliably guarantee this lock file will be properly removed. In testing in my NFS area (mix of Linux & OS X), I see NativeFSLockFactory sometimes (rarely) allowing a lock to be double-acquired. Whereas after stress testing SimpleFSLockFactory for a looong time, it never allows that. So the NFS challenge/saga continues: now, you should in fact use SimpleFSLockFactory, and work around the fact that you will sometimes have to manually remove lock files (it is the lesser of evils).
        Hide
        mikemccand Michael McCandless added a comment -

        Attaching patch. All tests pass and I think this is ready for
        commit. I'll wait a few days.

        What's always tricky about debugging this kind of issue is figuring
        out if it's a locking problem (two writers are incorrectly getting the
        write lock at the same time), or if it's a IO "stale cache" issue.

        To help with this, I created some basic instrumentation to "verify"
        that locking is functioning correctly:

        • A new LockFactory called VerifyingLockFactory, which just wraps a
          pre-existing LockFactory and every time a lock is obtained or
          released, contacts the LockVerifyServer (over a socket) to verify
          the lock is not held by another process. If it is held by another
          process, meaning the LockFactory is broken, an exception is
          thrown.
        • LockVerifyServer.java (main) which just runs forever, accepting &
          verifying these socket connections.
        • A standalone (main) LockStressTest.java, whose sole purpose is to
          obtain/release a specified lock file, very frequently. You run
          this on multiple machines, pointing to the same lock file, to
          verify your LockFactory is working correctly.

        Using these additions, one can stress test their locking in their
        particular environment to determine whether their LockFactory is
        working properly.

        I plan on committing these three source files so that others can
        diagnose locking issues using the Lucene core jar.

        Show
        mikemccand Michael McCandless added a comment - Attaching patch. All tests pass and I think this is ready for commit. I'll wait a few days. What's always tricky about debugging this kind of issue is figuring out if it's a locking problem (two writers are incorrectly getting the write lock at the same time), or if it's a IO "stale cache" issue. To help with this, I created some basic instrumentation to "verify" that locking is functioning correctly: A new LockFactory called VerifyingLockFactory, which just wraps a pre-existing LockFactory and every time a lock is obtained or released, contacts the LockVerifyServer (over a socket) to verify the lock is not held by another process. If it is held by another process, meaning the LockFactory is broken, an exception is thrown. LockVerifyServer.java (main) which just runs forever, accepting & verifying these socket connections. A standalone (main) LockStressTest.java, whose sole purpose is to obtain/release a specified lock file, very frequently. You run this on multiple machines, pointing to the same lock file, to verify your LockFactory is working correctly. Using these additions, one can stress test their locking in their particular environment to determine whether their LockFactory is working properly. I plan on committing these three source files so that others can diagnose locking issues using the Lucene core jar.

          People

          • Assignee:
            mikemccand Michael McCandless
            Reporter:
            mikemccand Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development