[HADOOP-11584] s3a file block size set to 0 in getFileStatus - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 2.6.0
Fix Version/s: 2.7.0
Component/s: fs/s3
Labels:
None

Description

The consequence is that mapreduce probably is not splitting s3a files in the expected way. This is similar to ~~HADOOP-5861~~ (which was for s3n, though s3n was passing 5G rather than 0 for block size).

FileInputFormat.getSplits() relies on the FileStatus block size being set:

        if (isSplitable(job, path)) {
          long blockSize = file.getBlockSize();
          long splitSize = computeSplitSize(blockSize, minSize, maxSize);

However, S3AFileSystem does not set the FileStatus block size field. From S3AFileStatus.java:

  // Files
  public S3AFileStatus(long length, long modification_time, Path path) {
    super(length, false, 1, 0, modification_time, path);
    isEmptyDirectory = false;
  }

I think it should use S3AFileSystem.getDefaultBlockSize() for each file's block size (where it's currently passing 0).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HADOOP-10584-003.patch
17/Feb/15 14:10
12 kB
Steve Loughran
HADOOP-111584.patch
13/Feb/15 08:59
3 kB
Brahma Reddy Battula
HADOOP-11584-002.patch
16/Feb/15 09:57
5 kB
Brahma Reddy Battula

Issue Links

is related to

HADOOP-11601 Enhance FS spec & tests to mandate FileStatus.getBlocksize() >0 for non-empty files

Resolved

HADOOP-11606 intermittent failure of TestS3AFileSystemContract.testRenameRootDirForbidden

Resolved

Activity

People

Assignee:: Brahma Reddy Battula

Reporter:: Daniel Hecht

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 11/Feb/15 20:14

Updated:: 24/Apr/15 22:49

Resolved:: 21/Feb/15 12:30