Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 0.23.7, 2.1.0-beta
    • Component/s: distcp
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      DistCp wraps the InputStream for each input file it reads in an instance of ThrottledInputStream. This class does not close the wrapped InputStream. RetriableFileCopyCommand guarantees that the ThrottledInputStream gets closed, but without closing the underlying wrapped stream, it still leaks a file handle.

        Issue Links

          Activity

          Hide
          Chris Nauroth added a comment -

          I discovered this while testing on Windows, where file locking is enforced more strictly. The DistCp tests would fail sporadically due to not being able to delete the temp files. I have a patch in progress.

          Show
          Chris Nauroth added a comment - I discovered this while testing on Windows, where file locking is enforced more strictly. The DistCp tests would fail sporadically due to not being able to delete the temp files. I have a patch in progress.
          Hide
          Chris Nauroth added a comment -

          Here is a patch with the following changes:

          1. RetriableFileCopyCommand - This is just code clean-up. The copyBytes private method accepted a flag as an argument to control whether or not to close the streams after copying. This method was only ever called from copyToTmpFile with a hard-coded true. I removed the flag from the method signature and changed the code so that it closes the streams unconditionally.
          2. ThrottledInputStream - Override close so that it closes the wrapped stream.
          3. TestIntegration - This code was not creating the target file correctly. target contains a fully qualified path. Inside createFiles, it prepends the test root again. This would be 2 fully qualified paths appended to each other. On Windows, the result would look like C:\project\target\C:\project\target. The second ':' makes the filename invalid.

          With this patch, all DistCp tests pass consistently on Mac and Windows.

          Show
          Chris Nauroth added a comment - Here is a patch with the following changes: RetriableFileCopyCommand - This is just code clean-up. The copyBytes private method accepted a flag as an argument to control whether or not to close the streams after copying. This method was only ever called from copyToTmpFile with a hard-coded true. I removed the flag from the method signature and changed the code so that it closes the streams unconditionally. ThrottledInputStream - Override close so that it closes the wrapped stream. TestIntegration - This code was not creating the target file correctly. target contains a fully qualified path. Inside createFiles , it prepends the test root again. This would be 2 fully qualified paths appended to each other. On Windows, the result would look like C:\project\target\C:\project\target. The second ':' makes the filename invalid. With this patch, all DistCp tests pass consistently on Mac and Windows.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12573975/MAPREDUCE-5075.1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 tests included appear to have a timeout.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-tools/hadoop-distcp.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3421//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3421//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573975/MAPREDUCE-5075.1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 tests included appear to have a timeout. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-tools/hadoop-distcp. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3421//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3421//console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 patch looks good.

          Show
          Tsz Wo Nicholas Sze added a comment - +1 patch looks good.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I have committed this. Thanks, Chris!

          Show
          Tsz Wo Nicholas Sze added a comment - I have committed this. Thanks, Chris!
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3494 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3494/)
          MAPREDUCE-5075. DistCp leaks input file handles since ThrottledInputStream does not close the wrapped InputStream. Contributed by Chris Nauroth (Revision 1458741)

          Result = SUCCESS
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458741
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java
          • /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java
          • /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #3494 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3494/ ) MAPREDUCE-5075 . DistCp leaks input file handles since ThrottledInputStream does not close the wrapped InputStream. Contributed by Chris Nauroth (Revision 1458741) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458741 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3495 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3495/)
          Move MAPREDUCE-5075 to 2.0.5-beta in CHANGES.txt. (Revision 1458749)

          Result = SUCCESS
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458749
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #3495 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3495/ ) Move MAPREDUCE-5075 to 2.0.5-beta in CHANGES.txt. (Revision 1458749) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458749 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1350 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1350/)
          Move MAPREDUCE-5075 to 2.0.5-beta in CHANGES.txt. (Revision 1458749)
          MAPREDUCE-5075. DistCp leaks input file handles since ThrottledInputStream does not close the wrapped InputStream. Contributed by Chris Nauroth (Revision 1458741)

          Result = FAILURE
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458749
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt

          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458741
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java
          • /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java
          • /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1350 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1350/ ) Move MAPREDUCE-5075 to 2.0.5-beta in CHANGES.txt. (Revision 1458749) MAPREDUCE-5075 . DistCp leaks input file handles since ThrottledInputStream does not close the wrapped InputStream. Contributed by Chris Nauroth (Revision 1458741) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458749 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458741 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1378 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1378/)
          Move MAPREDUCE-5075 to 2.0.5-beta in CHANGES.txt. (Revision 1458749)
          MAPREDUCE-5075. DistCp leaks input file handles since ThrottledInputStream does not close the wrapped InputStream. Contributed by Chris Nauroth (Revision 1458741)

          Result = SUCCESS
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458749
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt

          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458741
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java
          • /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java
          • /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1378 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1378/ ) Move MAPREDUCE-5075 to 2.0.5-beta in CHANGES.txt. (Revision 1458749) MAPREDUCE-5075 . DistCp leaks input file handles since ThrottledInputStream does not close the wrapped InputStream. Contributed by Chris Nauroth (Revision 1458741) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458749 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458741 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #162 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/162/)
          Move MAPREDUCE-5075 to 2.0.5-beta in CHANGES.txt. (Revision 1458749)
          MAPREDUCE-5075. DistCp leaks input file handles since ThrottledInputStream does not close the wrapped InputStream. Contributed by Chris Nauroth (Revision 1458741)

          Result = SUCCESS
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458749
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt

          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458741
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java
          • /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java
          • /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #162 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/162/ ) Move MAPREDUCE-5075 to 2.0.5-beta in CHANGES.txt. (Revision 1458749) MAPREDUCE-5075 . DistCp leaks input file handles since ThrottledInputStream does not close the wrapped InputStream. Contributed by Chris Nauroth (Revision 1458741) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458749 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458741 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java
          Hide
          Thomas Graves added a comment -

          I merged this into branch-0.23.

          Show
          Thomas Graves added a comment - I merged this into branch-0.23.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #566 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/566/)
          MAPREDUCE-5075. DistCp leaks input file handles since ThrottledInputStream does not close the wrapped InputStream. (Chris Nauroth via tgraves) (Revision 1461305)

          Result = SUCCESS
          tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1461305
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java
          • /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java
          • /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #566 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/566/ ) MAPREDUCE-5075 . DistCp leaks input file handles since ThrottledInputStream does not close the wrapped InputStream. (Chris Nauroth via tgraves) (Revision 1461305) Result = SUCCESS tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1461305 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java

            People

            • Assignee:
              Chris Nauroth
              Reporter:
              Chris Nauroth
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development