[HADOOP-12878] Impersonate hosts in s3a for better data locality handling - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.8.0
Fix Version/s: None
Component/s: fs/s3
Labels:
None

Target Version/s:

3.5.0

Description

Currently, localhost is passed as locality for each block, causing all blocks involved in job to initially target the same node (RM), before being moved by the scheduler (to a rack-local node). This reduces parallelism for jobs (with short-lived mappers).

We should mimic Azures implementation: a config setting fs.s3a.block.location.impersonatedhost where the user can enter the list of hostnames in the cluster to return to getFileBlockLocations.

Possible optimization: for larger systems, it might be better to return N (5?) random hostnames to prevent passing a huge array (the downstream code assumes size = O(3)).

Attachments

Issue Links

relates to

HADOOP-14943 Add common getFileBlockLocations() emulation for object stores, including S3A

Patch Available

Activity

People

Assignee:: Thomas Demoor

Reporter:: Thomas Demoor

Votes:: 1 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 03/Mar/16 13:38

Updated:: 03/Dec/24 13:39