Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4
    • Fix Version/s: 2.9
    • Component/s: core/store
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      LUCENE-638 added a check to the FSDirectory.list() method to only return files that are Lucene related. I think this change made the FSDirectory implementation inconsistent with all other methods in Directory. E.g. you can create a file with an arbitrary name using FSDirectory, fileExists() will report that it is there, deleteFile() will remove it, but the array returned by list() will not contain the file.

      The actual issue that was reported in LUCENE-638 was about sub directories. Those should clearly not be listed, but IMO it is not the responsibility of a Directory implementation to decide what kind of files can be created or listed. The Directory class is an abstraction of a directory and it should't to more than that.

      1. DirectoryTest.java
        0.8 kB
        Marcel Reutegger
      2. LUCENE-1468.patch
        30 kB
        Michael McCandless

        Activity

        Hide
        Marcel Reutegger added a comment -

        Test cases to illustrate the issue.

        Show
        Marcel Reutegger added a comment - Test cases to illustrate the issue.
        Hide
        Michael McCandless added a comment -

        I would tend to agree – Directory should try to be a "simple conduit" to the filesystem. It should not filter things in some methods and not others.

        It should be up to the caller to pick & choose which files are of interest.

        EG in LUCENE-638, RamDir should pick & choose to copy over only those files referenced by all live commit points.

        LUCENE-385 was another issue where filtering was added to FSDir so that when you opened an index with create=true, Lucene would only remove files that look like index files. This one should already be fixed (I haven't yet tested though), because we have now deprecated the create=true argument to FSDir in favor of IndexWriter's create=true. IndexWriter is smarter and careful (uses IndexFileDeleter; retries delete; keeps old commit points around if the deletion policy says so; etc.) about which files should be deleted.

        Especially as we move towards flexible indexing (LUCENE-1458), what is and is not an index file is up to the particular codec(s) in use and is no longer a simple list of file extensions. I had to change quite a few "interesting" places when I created the first codec that used new file extensions.

        So I agree neither list(), nor any other methods in FSDir, should do any filtering. But because of back compatibility: I don't think we can suddenly change list() to return all files. I think we'd have to deprecate list and then add eg "fullList" or "listAll" instead.

        Show
        Michael McCandless added a comment - I would tend to agree – Directory should try to be a "simple conduit" to the filesystem. It should not filter things in some methods and not others. It should be up to the caller to pick & choose which files are of interest. EG in LUCENE-638 , RamDir should pick & choose to copy over only those files referenced by all live commit points. LUCENE-385 was another issue where filtering was added to FSDir so that when you opened an index with create=true, Lucene would only remove files that look like index files. This one should already be fixed (I haven't yet tested though), because we have now deprecated the create=true argument to FSDir in favor of IndexWriter's create=true. IndexWriter is smarter and careful (uses IndexFileDeleter; retries delete; keeps old commit points around if the deletion policy says so; etc.) about which files should be deleted. Especially as we move towards flexible indexing ( LUCENE-1458 ), what is and is not an index file is up to the particular codec(s) in use and is no longer a simple list of file extensions. I had to change quite a few "interesting" places when I created the first codec that used new file extensions. So I agree neither list(), nor any other methods in FSDir, should do any filtering. But because of back compatibility: I don't think we can suddenly change list() to return all files. I think we'd have to deprecate list and then add eg "fullList" or "listAll" instead.
        Hide
        Michael McCandless added a comment -

        Attached patch. I think it's ready to commit. I'll wait a day or
        two.

        I deprecated Directory.list() in favor of the new
        Directory.listAll(). listAll does no filtering, and, never returns
        null (instead throws more meaningful IOExceptions).

        I added a new NoSuchDirectoryException in o.a.l.store, and
        FSDirectory.listAll throws this if the directory does not exist, or if
        it exists but isn't a directory, so you now get a clearer exception on
        opening an IndexReader an an invalid directory.

        I fixed all places in Lucene core & contrib that called list() to call
        listAll() instead. I also added test cases to make sure LUCENE-385
        and LUCENE-638 remain fixed.

        Show
        Michael McCandless added a comment - Attached patch. I think it's ready to commit. I'll wait a day or two. I deprecated Directory.list() in favor of the new Directory.listAll(). listAll does no filtering, and, never returns null (instead throws more meaningful IOExceptions). I added a new NoSuchDirectoryException in o.a.l.store, and FSDirectory.listAll throws this if the directory does not exist, or if it exists but isn't a directory, so you now get a clearer exception on opening an IndexReader an an invalid directory. I fixed all places in Lucene core & contrib that called list() to call listAll() instead. I also added test cases to make sure LUCENE-385 and LUCENE-638 remain fixed.
        Hide
        Michael McCandless added a comment -

        Committed revision 723789.

        Thanks Marcel!

        Show
        Michael McCandless added a comment - Committed revision 723789. Thanks Marcel!

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Marcel Reutegger
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development