Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15620 Über-jira: S3A phase VI: Hadoop 3.3 features
  3. HADOOP-14943

Add common getFileBlockLocations() emulation for object stores, including S3A

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.8.1
    • Fix Version/s: None
    • Component/s: fs/s3
    • Labels:
      None

      Description

      It looks suspiciously like S3A isn't providing the partitioning data needed in listLocatedStatus and getFileBlockLocations() needed to break up a file by the blocksize. This will stop tools using the MRv1 APIS doing the partitioning properly if the input format isn't doing it own split logic.

      FileInputFormat in MRv2 is a bit more configurable about input split calculation & will split up large files. but otherwise, the partitioning is being done more by the default values of the executing engine, rather than any config data from the filesystem about what its "block size" is,

      NativeAzureFS does a better job; maybe that could be factored out to hadoop-common and reused?

        Attachments

        1. HADOOP-14943-004.patch
          19 kB
          Steve Loughran
        2. HADOOP-14943-003.patch
          19 kB
          Steve Loughran
        3. HADOOP-14943-002.patch
          20 kB
          Steve Loughran
        4. HADOOP-14943-002.patch
          20 kB
          Steve Loughran
        5. HADOOP-14943-001.patch
          1 kB
          Steve Loughran

          Issue Links

            Activity

              People

              • Assignee:
                stevel@apache.org Steve Loughran
                Reporter:
                stevel@apache.org Steve Loughran
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated: