Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18477 Über-jira: S3A Hadoop 3.3.9 features
  3. HADOOP-14943

Add common getFileBlockLocations() emulation for object stores, including S3A

Add voteVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Patch Available
    • Minor
    • Resolution: Unresolved
    • 2.8.1
    • None
    • fs/s3
    • None

    Description

      It looks suspiciously like S3A isn't providing the partitioning data needed in listLocatedStatus and getFileBlockLocations() needed to break up a file by the blocksize. This will stop tools using the MRv1 APIS doing the partitioning properly if the input format isn't doing it own split logic.

      FileInputFormat in MRv2 is a bit more configurable about input split calculation & will split up large files. but otherwise, the partitioning is being done more by the default values of the executing engine, rather than any config data from the filesystem about what its "block size" is,

      NativeAzureFS does a better job; maybe that could be factored out to hadoop-common and reused?

      Attachments

        1. HADOOP-14943-001.patch
          1 kB
          Steve Loughran
        2. HADOOP-14943-002.patch
          20 kB
          Steve Loughran
        3. HADOOP-14943-002.patch
          20 kB
          Steve Loughran
        4. HADOOP-14943-003.patch
          19 kB
          Steve Loughran
        5. HADOOP-14943-004.patch
          19 kB
          Steve Loughran

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            stevel@apache.org Steve Loughran
            stevel@apache.org Steve Loughran

            Dates

              Created:
              Updated:

              Slack

                Issue deployment