XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.2.0
    • Fix Version/s: None
    • Component/s: fs/s3
    • Labels:
      None

      Description

      Looking at logs of LocatedFileStatus/FileInputFormat scans; there's a needless call to getFileStatus whenever a S3AFileSystem.listLocatedStatus() call is made

      1. S3AFileSystem.listLocatedStatus() does a getFileStatus call, returns the file status first
      2. But if you look at all the uses in the MR code in FileInputFormat and LocatedFileStatusFetcher, they only call this method knowing the destination is a directory

      Which means for every unguarded S3 path: two needless HEADS and a single entry LIST, before the real LIST is initiated.

      If the S3A FS can assume that a dest is a non-empty directory, then it can go straight to the LIST operation, only falling back to the HEAD + HEAD +/ if that fails.

      We could also think about doing the same for listStatus

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                stevel@apache.org Steve Loughran
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: