Lucene - Core
  1. Lucene - Core
  2. LUCENE-2316

Define clear semantics for Directory.fileLength

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      On this thread: http://mail-archives.apache.org/mod_mbox/lucene-java-dev/201003.mbox/%3C126142c1003121525v24499625u1589bbef4c0792e7@mail.gmail.com%3E it was mentioned that Directory's fileLength behavior is not consistent between Directory implementations if the given file name does not exist. FSDirectory returns a 0 length while RAMDirectory throws FNFE.

      The problem is that the semantics of fileLength() are not defined. As proposed in the thread, we'll define the following semantics:

      • Returns the length of the file denoted by <code>name</code> if the file exists. The return value may be anything between 0 and Long.MAX_VALUE.
      • Throws FileNotFoundException if the file does not exist. Note that you can call dir.fileExists(name) if you are not sure whether the file exists or not.

      For backwards we'll create a new method w/ clear semantics. Something like:

      /**
       * @deprecated the method will become abstract when #fileLength(name) has been removed.
       */
      public long getFileLength(String name) throws IOException {
        long len = fileLength(name);
        if (len == 0 && !fileExists(name)) {
          throw new FileNotFoundException(name);
        }
        return len;
      }
      

      The first line just calls the current impl. If it throws exception for a non-existing file, we're ok. The second line verifies whether a 0 length is for an existing file or not and throws an exception appropriately.

        Activity

        Hide
        Shai Erera added a comment -

        I am not sure we should mark getFileLength deprecated though, in order to alert users that it will become abstract. Can we instead just note that in its Javadocs? It will be awkward if we deprecate both fileLength and getFileLength .

        Show
        Shai Erera added a comment - I am not sure we should mark getFileLength deprecated though, in order to alert users that it will become abstract. Can we instead just note that in its Javadocs? It will be awkward if we deprecate both fileLength and getFileLength .
        Hide
        Marvin Humphrey added a comment -

        Is it really necessary to obtain the length of a file from the Directory? Lucy
        doesn't implement that functionality, and we haven't missed it – we're able
        to get away with using the length() method on InStream and OutStream.

        I see that IndexInput and IndexOutput already have length() methods. Can you
        simply eliminate all uses of Directory.fileLength() within core and deprecate
        it without introducing a new method?

        Show
        Marvin Humphrey added a comment - Is it really necessary to obtain the length of a file from the Directory? Lucy doesn't implement that functionality, and we haven't missed it – we're able to get away with using the length() method on InStream and OutStream. I see that IndexInput and IndexOutput already have length() methods. Can you simply eliminate all uses of Directory.fileLength() within core and deprecate it without introducing a new method?
        Hide
        Michael McCandless added a comment -

        I don't think Lucene relies on file length; we do use it for diagnostics/logging (eg IW's infoStream).

        Hmm I guess we do use it for merging purposes (LogByteSizeMergePolicy), where we try to merge roughly similarly sized (by net size in bytes) segments.

        I would rather not increase our reliance on it (eg LUCENE-2373 proposes to do so) – it's a piece of metadata that is sometimes (eg over NFS) less reliable than file contents. The less we rely on from the filesystem the more robust/portable Lucene will be...

        Net/net I think we should keep the method, but restrict our use of it.

        I do think we should move to throwing FNFE if the file does not exist... though I think a break in back compat may be OK, here? (Vs do the new method/deprecated/abstract approach).

        Show
        Michael McCandless added a comment - I don't think Lucene relies on file length; we do use it for diagnostics/logging (eg IW's infoStream). Hmm I guess we do use it for merging purposes (LogByteSizeMergePolicy), where we try to merge roughly similarly sized (by net size in bytes) segments. I would rather not increase our reliance on it (eg LUCENE-2373 proposes to do so) – it's a piece of metadata that is sometimes (eg over NFS) less reliable than file contents. The less we rely on from the filesystem the more robust/portable Lucene will be... Net/net I think we should keep the method, but restrict our use of it. I do think we should move to throwing FNFE if the file does not exist... though I think a break in back compat may be OK, here? (Vs do the new method/deprecated/abstract approach).
        Hide
        Earwin Burrfoot added a comment -

        I like Marvin's suggestion - deprecate/delete Dir.fileLength, and rely on InStream.length if needed (I believe if we actually open the file for reading, NFS gives us the correct length?).
        If not - throw FNFE, and go for backcompat break instead of new/deprecate cycle.

        Show
        Earwin Burrfoot added a comment - I like Marvin's suggestion - deprecate/delete Dir.fileLength, and rely on InStream.length if needed (I believe if we actually open the file for reading, NFS gives us the correct length?). If not - throw FNFE, and go for backcompat break instead of new/deprecate cycle.
        Hide
        Shai Erera added a comment -

        Well ... dir.fileLength is also used by SegmentInfos.sizeInBytes to compute the size of all the files in the Directory. If we remove fileLength, then SI will need to call dir.openInput.length() and the close it? Seems like a lot of work to me, for just obtaining the length of the file. So I agree that if you have an IndexInput at hand, you should call its length() method rather than Dir.fileLength. But otherwise, if you just have a name at hand, a dir.fileLength is convenient?

        I'm also ok w/ the bw break rather than going through the new/deprecate cycle.

        Show
        Shai Erera added a comment - Well ... dir.fileLength is also used by SegmentInfos.sizeInBytes to compute the size of all the files in the Directory. If we remove fileLength, then SI will need to call dir.openInput.length() and the close it? Seems like a lot of work to me, for just obtaining the length of the file. So I agree that if you have an IndexInput at hand, you should call its length() method rather than Dir.fileLength. But otherwise, if you just have a name at hand, a dir.fileLength is convenient? I'm also ok w/ the bw break rather than going through the new/deprecate cycle.
        Hide
        Michael McCandless added a comment -

        I'm also ok w/ the bw break rather than going through the new/deprecate cycle.

        +1

        I think we should keep both Dir.fileLength and II.fileLength.

        Show
        Michael McCandless added a comment - I'm also ok w/ the bw break rather than going through the new/deprecate cycle. +1 I think we should keep both Dir.fileLength and II.fileLength.
        Hide
        Shai Erera added a comment -

        Patch clarifies the contract, fixes the directories to adhere to it and adds a CHANGES under backwards section. All tests pass.

        Show
        Shai Erera added a comment - Patch clarifies the contract, fixes the directories to adhere to it and adds a CHANGES under backwards section. All tests pass.
        Hide
        Shai Erera added a comment -

        That's a fairly trivial patch, therefore I plan to commit tomorrow. Just FYI

        Show
        Shai Erera added a comment - That's a fairly trivial patch, therefore I plan to commit tomorrow. Just FYI
        Hide
        Shai Erera added a comment -

        Committed revision 933879.

        Show
        Shai Erera added a comment - Committed revision 933879.
        Hide
        Shai Erera added a comment -

        backport to 3.1

        Show
        Shai Erera added a comment - backport to 3.1
        Hide
        Shai Erera added a comment -

        Committed revision 941468.

        Show
        Shai Erera added a comment - Committed revision 941468.
        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1

          People

          • Assignee:
            Shai Erera
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development