Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-16775

DistCp reuses the same temp file within the task attempt for different files.

    XMLWordPrintableJSON

Details

    Description

      Hadoop DistCp reuses the same temp file name for all the files copied within each task attempt and then moves them to the target name, which is also a server side copy. For copies to S3, this will cause inconsistency as S3 is only consistent for reads after writes, for brand new objects. There is also inconsistency for contents of overwritten objects on S3.

      To avoid this, we should randomize the temp file name and for each temp file use a different name.

       

      Attachments

        1. HADOOP-16775-v1.patch
          0.9 kB
          Amir Shenavandeh
        2. HADOOP-16775.patch
          0.9 kB
          Amir Shenavandeh

        Issue Links

          Activity

            People

              shenavandeh Amir Shenavandeh
              shenavandeh Amir Shenavandeh
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: