XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 2.6.0
    • 2.7.0
    • fs/s3
    • None

    Description

      The consequence is that mapreduce probably is not splitting s3a files in the expected way. This is similar to HADOOP-5861 (which was for s3n, though s3n was passing 5G rather than 0 for block size).

      FileInputFormat.getSplits() relies on the FileStatus block size being set:

              if (isSplitable(job, path)) {
                long blockSize = file.getBlockSize();
                long splitSize = computeSplitSize(blockSize, minSize, maxSize);
      

      However, S3AFileSystem does not set the FileStatus block size field. From S3AFileStatus.java:

        // Files
        public S3AFileStatus(long length, long modification_time, Path path) {
          super(length, false, 1, 0, modification_time, path);
          isEmptyDirectory = false;
        }
      

      I think it should use S3AFileSystem.getDefaultBlockSize() for each file's block size (where it's currently passing 0).

      Attachments

        1. HADOOP-10584-003.patch
          12 kB
          Steve Loughran
        2. HADOOP-111584.patch
          3 kB
          Brahma Reddy Battula
        3. HADOOP-11584-002.patch
          5 kB
          Brahma Reddy Battula

        Issue Links

          Activity

            People

              brahmareddy Brahma Reddy Battula
              dhecht Daniel Hecht
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: