Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-17566 Über-jira: S3A Hadoop 3.4 features
  3. HADOOP-14943

Add common getFileBlockLocations() emulation for object stores, including S3A

Add voteVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Patch Available
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.8.1
    • Fix Version/s: None
    • Component/s: fs/s3
    • Labels:
      None

      Description

      It looks suspiciously like S3A isn't providing the partitioning data needed in listLocatedStatus and getFileBlockLocations() needed to break up a file by the blocksize. This will stop tools using the MRv1 APIS doing the partitioning properly if the input format isn't doing it own split logic.

      FileInputFormat in MRv2 is a bit more configurable about input split calculation & will split up large files. but otherwise, the partitioning is being done more by the default values of the executing engine, rather than any config data from the filesystem about what its "block size" is,

      NativeAzureFS does a better job; maybe that could be factored out to hadoop-common and reused?

        Attachments

        1. HADOOP-14943-001.patch
          1 kB
          Steve Loughran
        2. HADOOP-14943-002.patch
          20 kB
          Steve Loughran
        3. HADOOP-14943-002.patch
          20 kB
          Steve Loughran
        4. HADOOP-14943-003.patch
          19 kB
          Steve Loughran
        5. HADOOP-14943-004.patch
          19 kB
          Steve Loughran

        Issue Links

          Activity

            People

            • Assignee:
              stevel@apache.org Steve Loughran
              Reporter:
              stevel@apache.org Steve Loughran

              Dates

              • Created:
                Updated:

                Issue deployment