Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.6.0
    • Fix Version/s: 2.7.0
    • Component/s: fs/s3
    • Labels:
      None

      Description

      The consequence is that mapreduce probably is not splitting s3a files in the expected way. This is similar to HADOOP-5861 (which was for s3n, though s3n was passing 5G rather than 0 for block size).

      FileInputFormat.getSplits() relies on the FileStatus block size being set:

              if (isSplitable(job, path)) {
                long blockSize = file.getBlockSize();
                long splitSize = computeSplitSize(blockSize, minSize, maxSize);
      

      However, S3AFileSystem does not set the FileStatus block size field. From S3AFileStatus.java:

        // Files
        public S3AFileStatus(long length, long modification_time, Path path) {
          super(length, false, 1, 0, modification_time, path);
          isEmptyDirectory = false;
        }
      

      I think it should use S3AFileSystem.getDefaultBlockSize() for each file's block size (where it's currently passing 0).

        Attachments

        1. HADOOP-111584.patch
          3 kB
          Brahma Reddy Battula
        2. HADOOP-11584-002.patch
          5 kB
          Brahma Reddy Battula
        3. HADOOP-10584-003.patch
          12 kB
          Steve Loughran

          Issue Links

            Activity

              People

              • Assignee:
                brahmareddy Brahma Reddy Battula
                Reporter:
                dhecht Dan Hecht
              • Votes:
                0 Vote for this issue
                Watchers:
                14 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: