Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5968

Work directory is not deleted when downloadCacheObject throws IOException

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.1
    • Fix Version/s: 1.3.0
    • Component/s: mrv1
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Work directory is not deleted in DistCache if Exception happen in downloadCacheObject. In downloadCacheObject, the cache file will be copied to temporarily work directory first, then the work directory will be renamed to the final directory. If IOException happens during the copy, the work directory will not be deleted. This will cause garbage data left in local disk cache. For example If the MR application use Distributed Cache to send a very large Archive/file(50G), if the disk is full during the copy, then the IOException will be triggered, the work directory will be not deleted or renamed and the work directory will occupy a big chunk of disk space.

      1. MAPREDUCE-5968.branch1_new1.patch
        7 kB
        zhihai xu
      2. MAPREDUCE-5968.branch1_new.patch
        7 kB
        zhihai xu
      3. MAPREDUCE-5968.branch1.patch
        7 kB
        zhihai xu

        Activity

        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12655286/MAPREDUCE-5968.branch1.patch
        against trunk revision .

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4729//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12655286/MAPREDUCE-5968.branch1.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4729//console This message is automatically generated.
        Hide
        zhihai xu added a comment -

        Add a unit test to verify the workDir is removed using a Test File System to create a fake IOException to trigger this error condition in this issue. Without the fix in downloadCacheObject, the unit test will fail.

        Show
        zhihai xu added a comment - Add a unit test to verify the workDir is removed using a Test File System to create a fake IOException to trigger this error condition in this issue. Without the fix in downloadCacheObject, the unit test will fail.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12658808/MAPREDUCE-5968.branch1.patch
        against trunk revision .

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4779//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658808/MAPREDUCE-5968.branch1.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4779//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12658867/MAPREDUCE-5968.branch1.patch
        against trunk revision .

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4780//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658867/MAPREDUCE-5968.branch1.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4780//console This message is automatically generated.
        Hide
        Karthik Kambatla (Inactive) added a comment -

        Patch looks mostly good. A couple of minor comments:

        1. Do we need to set delWorkDir to false at both places? The latter is always executed and the former can be skipped.
          +      // promote the output to the final location
          +      if (!localFs.rename(workDir, finalDir)) {
          +        localFs.delete(workDir, true);
          +        delWorkDir = false;
          +        if (!localFs.exists(finalDir)) {
          +          throw new IOException("Failed to promote distributed cache object " +
          +                                workDir + " to " + finalDir);
          +        }
          +        // someone else promoted first
          +        return 0;
          +      }
          +      delWorkDir = false;
          
        2. I understand the "work" comes from how work directory name is generated. Can we create the work directory name in a method that can be accessed from both the production and test code so the test continues to be useful in the future.
          +    String workDir = destination.getParent().toString() + "-work-";
          
        Show
        Karthik Kambatla (Inactive) added a comment - Patch looks mostly good. A couple of minor comments: Do we need to set delWorkDir to false at both places? The latter is always executed and the former can be skipped. + // promote the output to the final location + if (!localFs.rename(workDir, finalDir)) { + localFs.delete(workDir, true ); + delWorkDir = false ; + if (!localFs.exists(finalDir)) { + throw new IOException( "Failed to promote distributed cache object " + + workDir + " to " + finalDir); + } + // someone else promoted first + return 0; + } + delWorkDir = false ; I understand the " work " comes from how work directory name is generated. Can we create the work directory name in a method that can be accessed from both the production and test code so the test continues to be useful in the future. + String workDir = destination.getParent().toString() + "-work-" ;
        Hide
        zhihai xu added a comment -

        thanks for the comments. these are good findings.
        1. Based on your suggestion, I can optimize the code: remove delWorkDir, check whether it exist before delete the work dir in final block.
        2. define "work" as a constant in the class, so it can be reused by both production and test code.

        Show
        zhihai xu added a comment - thanks for the comments. these are good findings. 1. Based on your suggestion, I can optimize the code: remove delWorkDir, check whether it exist before delete the work dir in final block. 2. define " work " as a constant in the class, so it can be reused by both production and test code.
        Hide
        zhihai xu added a comment -

        Hi Karthik,
        I attached new patch to address these issues. please review it.

        thanks
        zhihai

        Show
        zhihai xu added a comment - Hi Karthik, I attached new patch to address these issues. please review it. thanks zhihai
        Hide
        zhihai xu added a comment -

        I just upload a new patch MAPREDUCE-5968.branch1_new1.patch to remove the duplicate localFs.delete(workDir, true);.
        thanks Karthik for your review.

        Show
        zhihai xu added a comment - I just upload a new patch MAPREDUCE-5968 .branch1_new1.patch to remove the duplicate localFs.delete(workDir, true);. thanks Karthik for your review.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12659787/MAPREDUCE-5968.branch1_new1.patch
        against trunk revision .

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4787//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12659787/MAPREDUCE-5968.branch1_new1.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4787//console This message is automatically generated.
        Hide
        Karthik Kambatla (Inactive) added a comment -

        Latest patch looks good to me. +1. Committing this.

        Show
        Karthik Kambatla (Inactive) added a comment - Latest patch looks good to me. +1. Committing this.
        Hide
        Karthik Kambatla (Inactive) added a comment -

        Thanks for the contribution, zhihai xu. Just committed this to branch-1.

        Show
        Karthik Kambatla (Inactive) added a comment - Thanks for the contribution, zhihai xu . Just committed this to branch-1.

          People

          • Assignee:
            zhihai xu
            Reporter:
            zhihai xu
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development