Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: distcp
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      1. distcp -update launches job when there is at least one dir in source paths to be copied, even though there is nothing to copy.

      HADOOP-5675 added fileCount > 0 to be checked to decide whether to launch job. And HADOOP-5762 changed this to fileCount + dirCount > 0 to solve the issue of empty directories not getting copied to destination. With -update, dirCount is incremented without checking if that dir already exists at the destination. So distcp job is launched because of dirCount > 0 even though there is nothing to copy. Incrementing dirCount can be skipped if that dir already exists at the destination in case of -update.

      2. distcp doesn't skip copying file when we do -update on single file if the destfile already exists.

      When we do

      hadoop distcp -update srcfilename destfilename

      it seems to be comparing checksums of srcfilename and destfilename/srcfilename and so skip is not done. It should compare checksums of srcfilename and destfilename.

      See also MAPREDUCE-644.

      1. d_dirCount_648.patch
        2 kB
        Ravi Gummadi
      2. d_dirCount648.patch
        2 kB
        Ravi Gummadi
      3. d_dirCount648.v1.patch
        2 kB
        Ravi Gummadi
      4. d_648_644.patch
        8 kB
        Ravi Gummadi

        Issue Links

          Activity

          Ravi Gummadi created issue -
          Tsz Wo Nicholas Sze made changes -
          Field Original Value New Value
          Link This issue relates to HADOOP-5762 [ HADOOP-5762 ]
          Owen O'Malley made changes -
          Project Hadoop Common [ 12310240 ] Hadoop Map/Reduce [ 12310941 ]
          Key HADOOP-6053 MAPREDUCE-648
          Affects Version/s 0.21.0 [ 12313563 ]
          Issue Type Improvement [ 4 ] Bug [ 1 ]
          Component/s distcp [ 12312902 ]
          Component/s tools/distcp [ 12312387 ]
          Fix Version/s 0.21.0 [ 12313563 ]
          Ravi Gummadi made changes -
          Assignee Ravi Gummadi [ ravidotg ]
          Ravi Gummadi made changes -
          Attachment d_dirCount_648.patch [ 12411777 ]
          Ravi Gummadi made changes -
          Attachment d_dirCount648.patch [ 12418924 ]
          Ravi Gummadi made changes -
          Link This issue blocks MAPREDUCE-644 [ MAPREDUCE-644 ]
          Ravi Gummadi made changes -
          Attachment d_dirCount648.v1.patch [ 12419448 ]
          Ravi Gummadi made changes -
          Attachment d_648_644.patch [ 12419610 ]
          Tsz Wo Nicholas Sze made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Tsz Wo Nicholas Sze made changes -
          Hadoop Flags [Reviewed]
          Fix Version/s 0.21.0 [ 12314045 ]
          Tsz Wo Nicholas Sze made changes -
          Summary distcp -update launches job when there is at least one dir in source paths to be copied, even though there is nothing to copy Two distcp bugs
          Description distcp -update launches job when there is at least one dir in source paths to be copied, even though there is nothing to copy.

          HADOOP-5675 added fileCount > 0 to be checked to decide whether to launch job. And HADOOP-5762 changed this to fileCount + dirCount > 0 to solve the issue of empty directories not getting copied to destination. With -update, dirCount is incremented without checking if that dir already exists at the destination. So distcp job is launched because of dirCount > 0 even though there is nothing to copy. Incrementing dirCount can be skipped if that dir already exists at the destination in case of -update.
          h4. 1. distcp -update launches job when there is at least one dir in source paths to be copied, even though there is nothing to copy.

          HADOOP-5675 added fileCount > 0 to be checked to decide whether to launch job. And HADOOP-5762 changed this to fileCount + dirCount > 0 to solve the issue of empty directories not getting copied to destination. With -update, dirCount is incremented without checking if that dir already exists at the destination. So distcp job is launched because of dirCount > 0 even though there is nothing to copy. Incrementing dirCount can be skipped if that dir already exists at the destination in case of -update.

          h4. 2. distcp doesn't skip copying file when we do -update on single file if the destfile already exists.

          When we do

          hadoop distcp -update srcfilename destfilename

          it seems to be comparing checksums of srcfilename and destfilename/srcfilename and so skip is not done. It should compare checksums of srcfilename and destfilename.

          See also MAPREDUCE-644.
          Tsz Wo Nicholas Sze made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Ravi Gummadi
              Reporter:
              Ravi Gummadi
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development