Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.1, 0.20.2, 0.20.3, 0.21.0, 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: contrib/hdfsproxy, datanode
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed

      Description

      We just saw this error happen in our cluster. If there is a file that has a "+" sign in the name it is not readable through HFTP protocol.
      The problem is when we are reading a file with HFTP we are passing a name of the file as a parameter in request and + gets undecoded into space on the server side. So the datanode receiving the streamFile request tries to access a file with space instead of + in the name and doesn't find that file.

      The proposed solution is to pass the filename as a part of URL as with all the other HFTP commands, since this is the only place where it is not being treated this way. Are there any objections to this?

      1. HDFS-1109.patch
        11 kB
        Dmytro Molkov
      2. HDFS-1109.2.patch
        8 kB
        Dmytro Molkov
      3. HDFS-1109.2_y0.20.1xx.patch
        5 kB
        Tsz Wo Nicholas Sze
      4. HDFS-1109.2_y0.20.1xx_incremental.patch
        2 kB
        Rohini Palaniswamy

        Issue Links

          Activity

          Hide
          Dmytro Molkov added a comment -

          This will also be server side only change. There are two servlets that get affected by it the FileDataServlet and StreamFile, since you cannot run cluster with namenode and datanode not synced on build version there is no breaking compatibility.

          Show
          Dmytro Molkov added a comment - This will also be server side only change. There are two servlets that get affected by it the FileDataServlet and StreamFile, since you cannot run cluster with namenode and datanode not synced on build version there is no breaking compatibility.
          Hide
          sam rash added a comment -

          just curious, why not url encode the +?

          http://blooberry.com/indexdot/html/topics/urlencoding.htm

          %2B would result in '+'

          (w/o examples, maybe I don't see the precise problem, sorry)

          Show
          sam rash added a comment - just curious, why not url encode the +? http://blooberry.com/indexdot/html/topics/urlencoding.htm %2B would result in '+' (w/o examples, maybe I don't see the precise problem, sorry)
          Hide
          Dmytro Molkov added a comment -

          Well, it will essentially serve the same purpose. We could either url encode and decode that parameter being passed to streamFile servlet. Or change it to be url path instead of url parameter. Since all other HFTP operations pass the path of the file as the url path changing the streaming servlet to follow the same logic.

          Show
          Dmytro Molkov added a comment - Well, it will essentially serve the same purpose. We could either url encode and decode that parameter being passed to streamFile servlet. Or change it to be url path instead of url parameter. Since all other HFTP operations pass the path of the file as the url path changing the streaming servlet to follow the same logic.
          Hide
          sam rash added a comment -

          Ah, I see. I actually agree making the file part of the path is a cleaner approach (put the resource there as the URI spec suggests). I think my question was more about if there was a way to do it w/o changing the HFTP api--I think the url encoding approach would maintain compatibility. A new client could url encode and by default, I believe jetty will url decode the string and hence not require a server change.

          that said, i don't object. just throwing the thought out there

          Show
          sam rash added a comment - Ah, I see. I actually agree making the file part of the path is a cleaner approach (put the resource there as the URI spec suggests). I think my question was more about if there was a way to do it w/o changing the HFTP api--I think the url encoding approach would maintain compatibility. A new client could url encode and by default, I believe jetty will url decode the string and hence not require a server change. that said, i don't object. just throwing the thought out there
          Hide
          Dmytro Molkov added a comment -

          This url request is being generated in the namenode as a redirect for client. Client first contacts the namenode with a path to read, and this path is not url decoded when namenode performs its operations (I was getting exceptions with file not existing when encoding + by hand), then the client is forwarded to a datanode with a new url where the path is passed as a parameter and it is not decoded either, so it is either way namenode + datanode servlets modification.

          Show
          Dmytro Molkov added a comment - This url request is being generated in the namenode as a redirect for client. Client first contacts the namenode with a path to read, and this path is not url decoded when namenode performs its operations (I was getting exceptions with file not existing when encoding + by hand), then the client is forwarded to a datanode with a new url where the path is passed as a parameter and it is not decoded either, so it is either way namenode + datanode servlets modification.
          Hide
          Dmytro Molkov added a comment -

          Attaching a patch. It also affects HDFSProxy code, would like someone who knows about that part to review this one.
          It also introduces TestHftpFileSystem unittest to test the specific problem this patch is solving, but since there was no test for hftp before it should serve as an initial test for Hftp in general.
          The patch also fixes the problem introduced by HDFS-991 which threw NPE when trying to read a file through HFTP (stream file servlet was getting name.conf from the datanode context)

          Show
          Dmytro Molkov added a comment - Attaching a patch. It also affects HDFSProxy code, would like someone who knows about that part to review this one. It also introduces TestHftpFileSystem unittest to test the specific problem this patch is solving, but since there was no test for hftp before it should serve as an initial test for Hftp in general. The patch also fixes the problem introduced by HDFS-991 which threw NPE when trying to read a file through HFTP (stream file servlet was getting name.conf from the datanode context)
          Hide
          Dmytro Molkov added a comment -

          Submitting for tests.

          Show
          Dmytro Molkov added a comment - Submitting for tests.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12442918/HDFS-1109.patch
          against trunk revision 937914.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 12 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 release audit. The applied patch generated 113 release audit warnings (more than the trunk's current 112 warnings).

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/327/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/327/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/327/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/327/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/327/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12442918/HDFS-1109.patch against trunk revision 937914. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 113 release audit warnings (more than the trunk's current 112 warnings). -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/327/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/327/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/327/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/327/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/327/console This message is automatically generated.
          Hide
          Dmytro Molkov added a comment -

          Reattaching the patch with license header in the new test.

          Show
          Dmytro Molkov added a comment - Reattaching the patch with license header in the new test.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12442924/HDFS-1109.2.patch
          against trunk revision 937914.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/328/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/328/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/328/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/328/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12442924/HDFS-1109.2.patch against trunk revision 937914. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/328/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/328/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/328/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/328/console This message is automatically generated.
          Hide
          Dmytro Molkov added a comment -

          I cannot see the exact failure of hdfsproxy test, but it succeeds on my local machine.
          Can someone please review the patch?

          Show
          Dmytro Molkov added a comment - I cannot see the exact failure of hdfsproxy test, but it succeeds on my local machine. Can someone please review the patch?
          Hide
          Dmytro Molkov added a comment -

          I just checked HDFS-330 and HDFS-964 also fail the same hdfsproxy test. I cannot figure out why though. This means that 1109 patch is not affecting the testcase.

          Show
          Dmytro Molkov added a comment - I just checked HDFS-330 and HDFS-964 also fail the same hdfsproxy test. I cannot figure out why though. This means that 1109 patch is not affecting the testcase.
          Hide
          sam rash added a comment -

          +1 I reviewed the patch and lgtm

          Show
          sam rash added a comment - +1 I reviewed the patch and lgtm
          Hide
          dhruba borthakur added a comment -

          Code looks good, I wil commit this by the end of this week unless anybody else has any review comments.

          Show
          dhruba borthakur added a comment - Code looks good, I wil commit this by the end of this week unless anybody else has any review comments.
          Hide
          dhruba borthakur added a comment -

          I just resolved this. Thanks Dmytro.

          Show
          dhruba borthakur added a comment - I just resolved this. Thanks Dmytro.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          HDFS-1109.2_y0.20.1xx.patch: for y0.20.1xx

          Show
          Tsz Wo Nicholas Sze added a comment - HDFS-1109 .2_y0.20.1xx.patch: for y0.20.1xx
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Marking this as incompatible change since it changed the internal url query.

          Show
          Tsz Wo Nicholas Sze added a comment - Marking this as incompatible change since it changed the internal url query.
          Hide
          Dmytro Molkov added a comment -

          This change is only incompatible if:
          1) Namenode and datanode are of different versions
          2) you are going directly to datanode

          Otherwise since this is a server only change client and server can be
          of different (pre fix/post fix) versions

          On Jul 13, 2010, at 5:00 PM, "Tsz Wo (Nicholas), SZE (JIRA)"

          Show
          Dmytro Molkov added a comment - This change is only incompatible if: 1) Namenode and datanode are of different versions 2) you are going directly to datanode Otherwise since this is a server only change client and server can be of different (pre fix/post fix) versions On Jul 13, 2010, at 5:00 PM, "Tsz Wo (Nicholas), SZE (JIRA)"
          Hide
          Tsz Wo Nicholas Sze added a comment -

          This change is similar to changing DatanodeProtocol. We definitely mark it as an incompatible change if DatanodeProtocol is changed.

          Show
          Tsz Wo Nicholas Sze added a comment - This change is similar to changing DatanodeProtocol. We definitely mark it as an incompatible change if DatanodeProtocol is changed.
          Hide
          Rohini Palaniswamy added a comment -

          HDFS-1109.2_y0.20.1xx_incremental.patch: for y0.20.1xx. This is an incremental patch over Nicholas'. It fixes an issue in hdfsproxy where the filepath in streamFile is not picked up.

          Show
          Rohini Palaniswamy added a comment - HDFS-1109 .2_y0.20.1xx_incremental.patch: for y0.20.1xx. This is an incremental patch over Nicholas'. It fixes an issue in hdfsproxy where the filepath in streamFile is not picked up.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hi Rohini, the bug you discovered is not y0.20.1xx specific. Trunk also has this bug. So it is better create a new JIRA in order to fix it since this was already resolved in May.

          Show
          Tsz Wo Nicholas Sze added a comment - Hi Rohini, the bug you discovered is not y0.20.1xx specific. Trunk also has this bug. So it is better create a new JIRA in order to fix it since this was already resolved in May.
          Hide
          Rohini Palaniswamy added a comment -

          Thanks. Created issue https://issues.apache.org/jira/browse/HDFS-1317 for the fix in HDFSProxy.

          Show
          Rohini Palaniswamy added a comment - Thanks. Created issue https://issues.apache.org/jira/browse/HDFS-1317 for the fix in HDFSProxy.
          Hide
          Suresh Srinivas added a comment -

          Goto namenode web UI -> Browse the filesystem -> click on a file link.

          The URL sends the file name as the URL param. However the change made in this patch (see below), looks for filename in URL path.

          Index: src/java/org/apache/hadoop/hdfs/server/datanode/DatanodeJspHelper.java
          ===================================================================
          --- src/java/org/apache/hadoop/hdfs/server/datanode/DatanodeJspHelper.java	(revision 938218)
          +++ src/java/org/apache/hadoop/hdfs/server/datanode/DatanodeJspHelper.java	(working copy)
          @@ -236,7 +236,9 @@
               else
                 startOffset = Long.parseLong(startOffsetStr);
           
          -    final String filename=JspHelper.validatePath(req.getParameter("filename"));
          +    final String filename=JspHelper.validatePath(
          +                          req.getPathInfo() == null ? 
          +                          "/" : req.getPathInfo());
          
          

          Why are we using URL path instead of "filename" param?

          Show
          Suresh Srinivas added a comment - Goto namenode web UI -> Browse the filesystem -> click on a file link. The URL sends the file name as the URL param. However the change made in this patch (see below), looks for filename in URL path. Index: src/java/org/apache/hadoop/hdfs/server/datanode/DatanodeJspHelper.java =================================================================== --- src/java/org/apache/hadoop/hdfs/server/datanode/DatanodeJspHelper.java (revision 938218) +++ src/java/org/apache/hadoop/hdfs/server/datanode/DatanodeJspHelper.java (working copy) @@ -236,7 +236,9 @@ else startOffset = Long.parseLong(startOffsetStr); - final String filename=JspHelper.validatePath(req.getParameter("filename")); + final String filename=JspHelper.validatePath( + req.getPathInfo() == null ? + "/" : req.getPathInfo()); Why are we using URL path instead of "filename" param?
          Hide
          Eli Collins added a comment -

          Linking in MAPREDUCE-2052 and MAPREDUCE-1378 which are relevant.

          Show
          Eli Collins added a comment - Linking in MAPREDUCE-2052 and MAPREDUCE-1378 which are relevant.
          Hide
          Dmytro Molkov added a comment -

          Suresh, the reason for this patch was the way url encoding,decoding works.
          We had files that had a whitespace in the name, when the request got to the datanode the filename property had a '+' instead of the whitespace (it was url encoded but not decoded). As a result the datanode couldn't read that file and the HFTP call failed.
          Everywhere else we are using the URL path to pass the hdfs path around except for this case. Changing this one place deals with the url encoding and the path of the file is passed around properly in the HFTP protocol. Hope that answers your question.

          Show
          Dmytro Molkov added a comment - Suresh, the reason for this patch was the way url encoding,decoding works. We had files that had a whitespace in the name, when the request got to the datanode the filename property had a '+' instead of the whitespace (it was url encoded but not decoded). As a result the datanode couldn't read that file and the HFTP call failed. Everywhere else we are using the URL path to pass the hdfs path around except for this case. Changing this one place deals with the url encoding and the path of the file is passed around properly in the HFTP protocol. Hope that answers your question.
          Hide
          Suresh Srinivas added a comment -

          I understood that. The problem is, filename is being passed in "filename" URL parameter (see my comment on how to duplicate the problem). However the code is using req.getPathInfo() which is null. Hence browsing the file content does not work any more!

          Show
          Suresh Srinivas added a comment - I understood that. The problem is, filename is being passed in "filename" URL parameter (see my comment on how to duplicate the problem). However the code is using req.getPathInfo() which is null. Hence browsing the file content does not work any more!
          Hide
          Dmytro Molkov added a comment -

          Oh, this part is a bug, I never realized that in the web ui it was going after that url directly instead of using the HFTP kind of call to the namenode to get redirected.
          Should I open a JIRA to fix it? or do you want to reopen this one?

          Show
          Dmytro Molkov added a comment - Oh, this part is a bug, I never realized that in the web ui it was going after that url directly instead of using the HFTP kind of call to the namenode to get redirected. Should I open a JIRA to fix it? or do you want to reopen this one?
          Hide
          Suresh Srinivas added a comment -

          We can do it in a separate jira, marking the new one as related to this.

          Show
          Suresh Srinivas added a comment - We can do it in a separate jira, marking the new one as related to this.
          Hide
          Jitendra Nath Pandey added a comment -

          Dmytro,
          Is there a separate jira created to fix this issue? This is a blocker for 22.

          Show
          Jitendra Nath Pandey added a comment - Dmytro, Is there a separate jira created to fix this issue? This is a blocker for 22.
          Hide
          Bill Au added a comment -

          Anyone out there with a version of this patch for 0.20.2?

          Show
          Bill Au added a comment - Anyone out there with a version of this patch for 0.20.2?
          Hide
          Eli Collins added a comment -

          Does HDFS-1317 fix all the remaining issues, it looks like DfsServlet is still generating redirects using the filename parameter.

          Show
          Eli Collins added a comment - Does HDFS-1317 fix all the remaining issues, it looks like DfsServlet is still generating redirects using the filename parameter.
          Hide
          Eli Collins added a comment -

          Filed HDFS-2236 for the remaining places that use the filename parameter.

          Show
          Eli Collins added a comment - Filed HDFS-2236 for the remaining places that use the filename parameter.
          Hide
          Eli Collins added a comment -

          HDFS-1575 fixed the issue viewing blocks from the Web UI.

          Show
          Eli Collins added a comment - HDFS-1575 fixed the issue viewing blocks from the Web UI.
          Hide
          TaoZhang added a comment -

          I use distcp copying data from hadoop1.0.3(by hftp) to hadoop 2.0.1 (hdfs).
          When the file path(or file name) contain Chinese character, an exception will throw.

          I also have tried distcp between 1.0.3s and 2.0.1s.
          Both are failed.

          Path contain Chinese character.

          ----------------------
          1.0.3 hftp to 1.0.3 hdfs, exception inform is below.

          12/08/29 00:24:23 INFO tools.DistCp: sourcePathsCount=2
          12/08/29 00:24:23 INFO tools.DistCp: filesToCopyCount=1
          12/08/29 00:24:23 INFO tools.DistCp: bytesToCopyCount=1.2k
          12/08/29 00:24:24 INFO mapred.JobClient: Running job: job_201208101345_2203
          12/08/29 00:24:25 INFO mapred.JobClient: map 0% reduce 0%
          12/08/29 00:24:46 INFO mapred.JobClient: Task Id : attempt_201208101345_2203_m_000000_0, Status : FAILED
          java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
          at org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:582)
          at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
          at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
          at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:396)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
          at org.apache.hadoop.mapred.Child.main(Child.java:249)

          12/08/29 00:25:04 INFO mapred.JobClient: Task Id : attempt_201208101345_2203_m_000000_1, Status : FAILED
          java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
          at org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:582)
          at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
          at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
          at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:396)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
          at org.apache.hadoop.mapred.Child.main(Child.java:249)

          12/08/29 00:25:19 INFO mapred.JobClient: Task Id : attempt_201208101345_2203_m_000000_2, Status : FAILED
          java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
          at org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:582)
          at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
          at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
          at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:396)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
          at org.apache.hadoop.mapred.Child.main(Child.java:249)

          12/08/29 00:25:40 INFO mapred.JobClient: Job complete: job_201208101345_2203
          12/08/29 00:25:40 INFO mapred.JobClient: Counters: 6
          12/08/29 00:25:40 INFO mapred.JobClient: Job Counters
          12/08/29 00:25:40 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=66844
          12/08/29 00:25:40 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
          12/08/29 00:25:40 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
          12/08/29 00:25:40 INFO mapred.JobClient: Launched map tasks=4
          12/08/29 00:25:40 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
          12/08/29 00:25:40 INFO mapred.JobClient: Failed map tasks=1
          12/08/29 00:25:40 INFO mapred.JobClient: Job Failed: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201208101345_2203_m_000000
          With failures, global counters are inaccurate; consider running with -i
          Copy failed: java.io.IOException: Job failed!
          at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
          at org.apache.hadoop.tools.DistCp.copy(DistCp.java:667)
          at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
          at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
          at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
          at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

          -----------------------------------------
          2.0.1 hftp to 2.0.1 hdfs, exception inform is below.

          12/08/29 00:20:06 INFO tools.DistCp: DistCp job-id: job_1345831938927_0043
          12/08/29 00:20:06 INFO mapreduce.Job: Running job: job_1345831938927_0043
          12/08/29 00:20:14 INFO mapreduce.Job: Job job_1345831938927_0043 running in uber mode : false
          12/08/29 00:20:14 INFO mapreduce.Job: map 0% reduce 0%
          12/08/29 00:20:23 INFO mapreduce.Job: Task Id : attempt_1345831938927_0043_m_000000_0, Status : FAILED
          Error: java.io.IOException: File copy failed: hftp://baby20:50070/tmp/??.log/add.csv --> hdfs://baby20:54310/tmp4/add.csv
          at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:262)
          at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:229)
          at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
          at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
          at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
          at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:152)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:396)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
          at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:147)
          Caused by: java.io.IOException: Couldn't run retriable-command: Copying hftp://baby20:50070/tmp/中文.log/add.csv to hdfs://baby20:54310/tmp4/add.csv
          at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
          at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:258)
          ... 10 more
          Caused by: org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException: java.io.IOException: HTTP_OK expected, received 400
          at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:201)
          at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:167)
          at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToTmpFile(RetriableFileCopyCommand.java:112)
          at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:90)
          at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:71)
          at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
          ... 11 more
          Caused by: java.io.IOException: HTTP_OK expected, received 400
          at org.apache.hadoop.hdfs.HftpFileSystem$RangeHeaderInputStream.checkResponseCode(HftpFileSystem.java:381)
          at org.apache.hadoop.hdfs.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:121)
          at org.apache.hadoop.hdfs.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:103)
          at org.apache.hadoop.hdfs.ByteRangeInputStream.read(ByteRangeInputStream.java:158)
          at java.io.DataInputStream.read(DataInputStream.java:132)
          at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
          at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
          at java.io.FilterInputStream.read(FilterInputStream.java:90)
          at org.apache.hadoop.tools.util.ThrottledInputStream.read(ThrottledInputStream.java:70)
          at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:198)
          ... 16 more

          12/08/29 00:20:23 WARN mapreduce.Job: Error reading task output Server returned HTTP response code: 400 for URL: http://baby19:8080/tasklog?plaintext=true&attemptid=attempt_1345831938927_0043_m_000000_0&filter=stdout
          12/08/29 00:20:23 WARN mapreduce.Job: Error reading task output Server returned HTTP response code: 400 for URL: http://baby19:8080/tasklog?plaintext=true&attemptid=attempt_1345831938927_0043_m_000000_0&filter=stderr

          Show
          TaoZhang added a comment - I use distcp copying data from hadoop1.0.3(by hftp) to hadoop 2.0.1 (hdfs). When the file path(or file name) contain Chinese character, an exception will throw. I also have tried distcp between 1.0.3s and 2.0.1s. Both are failed. Path contain Chinese character. ---------------------- 1.0.3 hftp to 1.0.3 hdfs, exception inform is below. 12/08/29 00:24:23 INFO tools.DistCp: sourcePathsCount=2 12/08/29 00:24:23 INFO tools.DistCp: filesToCopyCount=1 12/08/29 00:24:23 INFO tools.DistCp: bytesToCopyCount=1.2k 12/08/29 00:24:24 INFO mapred.JobClient: Running job: job_201208101345_2203 12/08/29 00:24:25 INFO mapred.JobClient: map 0% reduce 0% 12/08/29 00:24:46 INFO mapred.JobClient: Task Id : attempt_201208101345_2203_m_000000_0, Status : FAILED java.io.IOException: Copied: 0 Skipped: 0 Failed: 1 at org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:582) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) 12/08/29 00:25:04 INFO mapred.JobClient: Task Id : attempt_201208101345_2203_m_000000_1, Status : FAILED java.io.IOException: Copied: 0 Skipped: 0 Failed: 1 at org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:582) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) 12/08/29 00:25:19 INFO mapred.JobClient: Task Id : attempt_201208101345_2203_m_000000_2, Status : FAILED java.io.IOException: Copied: 0 Skipped: 0 Failed: 1 at org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:582) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) 12/08/29 00:25:40 INFO mapred.JobClient: Job complete: job_201208101345_2203 12/08/29 00:25:40 INFO mapred.JobClient: Counters: 6 12/08/29 00:25:40 INFO mapred.JobClient: Job Counters 12/08/29 00:25:40 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=66844 12/08/29 00:25:40 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/08/29 00:25:40 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/08/29 00:25:40 INFO mapred.JobClient: Launched map tasks=4 12/08/29 00:25:40 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 12/08/29 00:25:40 INFO mapred.JobClient: Failed map tasks=1 12/08/29 00:25:40 INFO mapred.JobClient: Job Failed: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201208101345_2203_m_000000 With failures, global counters are inaccurate; consider running with -i Copy failed: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265) at org.apache.hadoop.tools.DistCp.copy(DistCp.java:667) at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) ----------------------------------------- 2.0.1 hftp to 2.0.1 hdfs, exception inform is below. 12/08/29 00:20:06 INFO tools.DistCp: DistCp job-id: job_1345831938927_0043 12/08/29 00:20:06 INFO mapreduce.Job: Running job: job_1345831938927_0043 12/08/29 00:20:14 INFO mapreduce.Job: Job job_1345831938927_0043 running in uber mode : false 12/08/29 00:20:14 INFO mapreduce.Job: map 0% reduce 0% 12/08/29 00:20:23 INFO mapreduce.Job: Task Id : attempt_1345831938927_0043_m_000000_0, Status : FAILED Error: java.io.IOException: File copy failed: hftp://baby20:50070/tmp/??.log/add.csv --> hdfs://baby20:54310/tmp4/add.csv at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:262) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:229) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:152) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:147) Caused by: java.io.IOException: Couldn't run retriable-command: Copying hftp://baby20:50070/tmp/中文.log/add.csv to hdfs://baby20:54310/tmp4/add.csv at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101) at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:258) ... 10 more Caused by: org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException: java.io.IOException: HTTP_OK expected, received 400 at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:201) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:167) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToTmpFile(RetriableFileCopyCommand.java:112) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:90) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:71) at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) ... 11 more Caused by: java.io.IOException: HTTP_OK expected, received 400 at org.apache.hadoop.hdfs.HftpFileSystem$RangeHeaderInputStream.checkResponseCode(HftpFileSystem.java:381) at org.apache.hadoop.hdfs.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:121) at org.apache.hadoop.hdfs.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:103) at org.apache.hadoop.hdfs.ByteRangeInputStream.read(ByteRangeInputStream.java:158) at java.io.DataInputStream.read(DataInputStream.java:132) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at java.io.FilterInputStream.read(FilterInputStream.java:90) at org.apache.hadoop.tools.util.ThrottledInputStream.read(ThrottledInputStream.java:70) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:198) ... 16 more 12/08/29 00:20:23 WARN mapreduce.Job: Error reading task output Server returned HTTP response code: 400 for URL: http://baby19:8080/tasklog?plaintext=true&attemptid=attempt_1345831938927_0043_m_000000_0&filter=stdout 12/08/29 00:20:23 WARN mapreduce.Job: Error reading task output Server returned HTTP response code: 400 for URL: http://baby19:8080/tasklog?plaintext=true&attemptid=attempt_1345831938927_0043_m_000000_0&filter=stderr

            People

            • Assignee:
              Dmytro Molkov
              Reporter:
              Dmytro Molkov
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development