[HADOOP-14943] Add common getFileBlockLocations() emulation for object stores, including S3A - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Patch Available
Priority: Minor
Resolution: Unresolved
Affects Version/s: 2.8.1
Fix Version/s: None
Component/s: fs/s3
Labels:
None

Description

It looks suspiciously like S3A isn't providing the partitioning data needed in listLocatedStatus and getFileBlockLocations() needed to break up a file by the blocksize. This will stop tools using the MRv1 APIS doing the partitioning properly if the input format isn't doing it own split logic.

FileInputFormat in MRv2 is a bit more configurable about input split calculation & will split up large files. but otherwise, the partitioning is being done more by the default values of the executing engine, rather than any config data from the filesystem about what its "block size" is,

NativeAzureFS does a better job; maybe that could be factored out to hadoop-common and reused?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HADOOP-14943-001.patch
12/Oct/17 13:03
1 kB
Steve Loughran
HADOOP-14943-002.patch
17/Nov/17 10:43
20 kB
Steve Loughran
HADOOP-14943-002.patch
16/Nov/17 20:24
20 kB
Steve Loughran
HADOOP-14943-003.patch
23/Nov/17 17:28
19 kB
Steve Loughran
HADOOP-14943-004.patch
14/Feb/18 19:01
19 kB
Steve Loughran

Issue Links

breaks

SPARK-22240 S3 CSV number of partitions incorrectly computed

Resolved

contains

HADOOP-15044 Wasb getFileBlockLocations() returns too many locations.

Resolved

is depended upon by

HADOOP-15132 Über-jira: WASB client phase III: roll-up for Hadoop 3.2

Open

is related to

HADOOP-12878 Impersonate hosts in s3a for better data locality handling

Open

HADOOP-15000 s3a new getdefaultblocksize be called in getFileStatus which has not been implemented in s3afilesystem yet

Open

HADOOP-15320 Remove customized getFileBlockLocations for hadoop-azure and hadoop-azure-datalake

Resolved

relates to

HDFS-12831 HDFS throws FileNotFoundException on getFileBlockLocations(path-to-directory)

Patch Available

(1 is related to, 1 relates to)

Activity

People

Assignee:: Steve Loughran

Reporter:: Steve Loughran

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 11/Oct/17 14:27

Updated:: 04/Oct/22 16:23