Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
When writing to an S3-based target, the temp file and rename logic in RetriableFileCopyCommand adds some unnecessary cost to the job, as the rename operation does a server-side copy + delete in S3 [1]. The renames are parallelized across all of the DistCp map tasks, so the severity is mitigated to some extent. However a configuration property to conditionally allow distributed copies to avoid that expense and write directly to the target path would improve performance considerably.
Attachments
Issue Links
- duplicates
-
HADOOP-15281 Distcp to add no-rename copy option
- Resolved