Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-16775

DistCp reuses the same temp file within the task attempt for different files.

    XMLWordPrintableJSON

    Details

      Description

      Hadoop DistCp reuses the same temp file name for all the files copied within each task attempt and then moves them to the target name, which is also a server side copy. For copies to S3, this will cause inconsistency as S3 is only consistent for reads after writes, for brand new objects. There is also inconsistency for contents of overwritten objects on S3.

      To avoid this, we should randomize the temp file name and for each temp file use a different name.

       

        Attachments

        1. HADOOP-16775-v1.patch
          0.9 kB
          Amir Shenavandeh
        2. HADOOP-16775.patch
          0.9 kB
          Amir Shenavandeh

          Issue Links

            Activity

              People

              • Assignee:
                shenavandeh Amir Shenavandeh
                Reporter:
                shenavandeh Amir Shenavandeh
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: