Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18599

Expose `listStatus(Path path, String startFrom)` on `AzureBlobFileSystem`

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.3.2, 3.3.4
    • None
    • fs/azure
    • None

    Description

      When working with Azure blob storage listing operations can often be quite slow even on storage accounts with the hierarchical namespace. 

      This can be mitigated by listing only a specific subset of directories using a function like https://hadoop.apache.org/docs/r3.3.4/api/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.html#listStatus-org.apache.hadoop.fs.Path-java.lang.String-org.apache.hadoop.fs.azurebfs.utils.TracingContext-

      Which accepts a `startFrom` argument and lists all files in order starting from there.

      I'm wondering if we could add a method to the `AzureBlobFileSystem`

      Something like:

      ```
      public FileStatus[] listStatus(final Path f, final String startFrom) throws IOException
      ```

      This exposes the functionality that already exists on the underlying `AzureBlobFileSystemStore`. My understanding from reading a bit of the code is that users should mainly be dealing with `AzureBlobFileSystem`s and `AzureBlobFileSystem` seem easier to use to me hence the benefit of exposing it on the `AzureBlobFileSystem`.

       

      I'm very un-familiar with java but I'm told that keeping strictly to interfaces is strongly preferred. However I can see some examples already on `AzureBlobFileSystem` that do not belong to any interface (e.g. `breakLease`) so I'm hoping its acceptable to add a method like I described only for the one `FileSystem` implementation.

       

      The specific motivation for this is to unblock https://github.com/delta-io/delta/issues/1568

      I would be willing to contribute this if maintainers think the plan is reasonable. 

      Attachments

        Activity

          People

            Unassigned Unassigned
            Tom_Newton Thomas Newton
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: