Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5541

FileExistsCachingDirectory, to work around unreliable File.exists

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/store
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      File.exists is a dangerous method in Java, because if there is a
      low-level IOException (permission denied, out of file handles, etc.)
      the method can return false when it should return true.

      Fortunately, as of Lucene 4.x, we rely much less on File.exists,
      because we track which files the codec components created, and we know
      those files then exist.

      But, unfortunately, going from 3.0.x to 3.6.x, we increased our
      reliance on File.exists, e.g. when creating CFS we check File.exists
      on each sub-file before trying to add it, and I have a customer
      corruption case where apparently a transient low level IOE caused
      File.exists to incorrectly return false for one of the sub-files. It
      results in corruption like this:

        java.io.FileNotFoundException: No sub-file with id .fnm found (fileName=_1u7.cfs files: [.tis, .tii, .frq, .prx, .fdt, .nrm, .fdx])
            org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:157)
            org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:146)
            org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:71)
            org.apache.lucene.index.IndexWriter.getFieldInfos(IndexWriter.java:1212)
            org.apache.lucene.index.IndexWriter.getCurrentFieldInfos(IndexWriter.java:1228)
            org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1161)
      

      I think typically local file systems don't often hit such low level
      errors, but if you have an index on a remote filesystem, where network
      hiccups can cause problems, it's more likely.

      As a simple workaround, I created a basic Directory delegator that
      holds a Set of all created but not deleted files, and short-circuits
      fileExists to return true if the file is in that set.

      I don't plan to commit this: we aren't doing bug-fix releases on
      3.6.x anymore (it's very old by now), and this problem is already
      "fixed" in 4.x (by reducing our reliance on File.exists), but I wanted
      to post the code here in case others hit it. It looks like it was hit
      e.g. https://netbeans.org/bugzilla/show_bug.cgi?id=189571 and
      https://issues.jboss.org/browse/ISPN-2981

        Attachments

        1. LUCENE-5541.patch
          11 kB
          Michael McCandless

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mikemccand Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: