Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.0
Description
Hadoop DistCp reuses the same temp file name for all the files copied within each task attempt and then moves them to the target name, which is also a server side copy. For copies to S3, this will cause inconsistency as S3 is only consistent for reads after writes, for brand new objects. There is also inconsistency for contents of overwritten objects on S3.
To avoid this, we should randomize the temp file name and for each temp file use a different name.
Attachments
Attachments
Issue Links
- Blocked
-
HADOOP-16776 backport HADOOP-16775: distcp copies to s3 are randomly corrupted
- Resolved