Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • 3.3.0
    • fs/s3
    • None

    Description

      Looking at logs of LocatedFileStatus/FileInputFormat scans; there's a needless call to getFileStatus whenever a S3AFileSystem.listLocatedStatus() call is made

      1. S3AFileSystem.listLocatedStatus() does a getFileStatus call, returns the file status first
      2. But if you look at all the uses in the MR code in FileInputFormat and LocatedFileStatusFetcher, they only call this method knowing the destination is a directory

      Which means for every unguarded S3 path: two needless HEADS and a single entry LIST, before the real LIST is initiated.

      If the S3A FS can assume that a dest is a non-empty directory, then it can go straight to the LIST operation, only falling back to the HEAD + HEAD +/ if that fails.

      We could also think about doing the same for listStatus

      Attachments

        Issue Links

          Activity

            People

              mukund-thakur Mukund Thakur
              stevel@apache.org Steve Loughran
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: