Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3788

distcp can't copy large files using webhdfs due to missing Content-Length header

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.23.3, 2.0.0-alpha
    • Fix Version/s: 0.23.3, 2.0.2-alpha
    • Component/s: webhdfs
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The following command fails when data1 contains a 3gb file. It passes when using hftp or when the directory just contains smaller (<2gb) files, so looks like a webhdfs issue with large files.

      hadoop distcp webhdfs://eli-thinkpad:50070/user/eli/data1 hdfs://localhost:8020/user/eli/data2

      1. h3788_20120816.patch
        4 kB
        Tsz Wo Nicholas Sze
      2. h3788_20120815.patch
        4 kB
        Tsz Wo Nicholas Sze
      3. 20120814NullEntity.patch
        11 kB
        Tsz Wo Nicholas Sze
      4. h3788_20120814b.patch
        4 kB
        Tsz Wo Nicholas Sze
      5. h3788_20120814.patch
        4 kB
        Tsz Wo Nicholas Sze
      6. h3788_20120813.patch
        4 kB
        Tsz Wo Nicholas Sze
      7. distcp-webhdfs-errors.txt
        12 kB
        Eli Collins

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #347 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/347/)
          svn merge -c 1374122. FIXES: HDFS-3788. ByteRangeInputStream should not expect HTTP Content-Length header when chunked transfer-encoding is used. (Revision 1374406)

          Result = SUCCESS
          daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374406
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/ByteRangeInputStream.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #347 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/347/ ) svn merge -c 1374122. FIXES: HDFS-3788 . ByteRangeInputStream should not expect HTTP Content-Length header when chunked transfer-encoding is used. (Revision 1374406) Result = SUCCESS daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374406 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/ByteRangeInputStream.java
          Hide
          Daryn Sharp added a comment -

          Merged to 23.

          Show
          Daryn Sharp added a comment - Merged to 23.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1169 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1169/)
          HDFS-3788. ByteRangeInputStream should not expect HTTP Content-Length header when chunked transfer-encoding is used. (Revision 1374122)

          Result = FAILURE
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374122
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/ByteRangeInputStream.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1169 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1169/ ) HDFS-3788 . ByteRangeInputStream should not expect HTTP Content-Length header when chunked transfer-encoding is used. (Revision 1374122) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374122 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/ByteRangeInputStream.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1137 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1137/)
          HDFS-3788. ByteRangeInputStream should not expect HTTP Content-Length header when chunked transfer-encoding is used. (Revision 1374122)

          Result = FAILURE
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374122
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/ByteRangeInputStream.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1137 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1137/ ) HDFS-3788 . ByteRangeInputStream should not expect HTTP Content-Length header when chunked transfer-encoding is used. (Revision 1374122) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374122 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/ByteRangeInputStream.java
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12541317/h3788_20120816.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.TestLeaseRenewer

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3037//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3037//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12541317/h3788_20120816.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestLeaseRenewer +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3037//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3037//console This message is automatically generated.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2622 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2622/)
          HDFS-3788. ByteRangeInputStream should not expect HTTP Content-Length header when chunked transfer-encoding is used. (Revision 1374122)

          Result = FAILURE
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374122
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/ByteRangeInputStream.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2622 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2622/ ) HDFS-3788 . ByteRangeInputStream should not expect HTTP Content-Length header when chunked transfer-encoding is used. (Revision 1374122) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374122 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/ByteRangeInputStream.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2657 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2657/)
          HDFS-3788. ByteRangeInputStream should not expect HTTP Content-Length header when chunked transfer-encoding is used. (Revision 1374122)

          Result = SUCCESS
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374122
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/ByteRangeInputStream.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2657 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2657/ ) HDFS-3788 . ByteRangeInputStream should not expect HTTP Content-Length header when chunked transfer-encoding is used. (Revision 1374122) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374122 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/ByteRangeInputStream.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2592 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2592/)
          HDFS-3788. ByteRangeInputStream should not expect HTTP Content-Length header when chunked transfer-encoding is used. (Revision 1374122)

          Result = SUCCESS
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374122
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/ByteRangeInputStream.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2592 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2592/ ) HDFS-3788 . ByteRangeInputStream should not expect HTTP Content-Length header when chunked transfer-encoding is used. (Revision 1374122) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1374122 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/ByteRangeInputStream.java
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I have committed this.

          Show
          Tsz Wo Nicholas Sze added a comment - I have committed this.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Fair enough. I added spaces in the comment so that it is consistent with the existing code. I did not change the for-loop since there is no existing for-loop having the space.

          h3788_20120816.patch

          Show
          Tsz Wo Nicholas Sze added a comment - Fair enough. I added spaces in the comment so that it is consistent with the existing code. I did not change the for-loop since there is no existing for-loop having the space. h3788_20120816.patch
          Hide
          Eli Collins added a comment -

          I think we don't have a standard in our code for this kind of style. So I will leave them as-is.

          All the surrounding conditionals and loops have spaces, as does the vast majority of the rest of the code base, including the new code you're adding in this patch. Good to be consistent even within your own patches right?

          Otherwise +1 looks great

          Show
          Eli Collins added a comment - I think we don't have a standard in our code for this kind of style. So I will leave them as-is. All the surrounding conditionals and loops have spaces, as does the vast majority of the rest of the code base, including the new code you're adding in this patch. Good to be consistent even within your own patches right? Otherwise +1 looks great
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12541185/h3788_20120815.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3022//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3022//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12541185/h3788_20120815.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3022//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3022//console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h3788_20120815.patch: use Long for file length.

          Show
          Tsz Wo Nicholas Sze added a comment - h3788_20120815.patch: use Long for file length.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Eli, thanks for reviewing the patch!

          > Where are we setting chunked encoding on the read path? I only see that on create create path

          We don't have to do anything. The library handles it.

          > Wouldn't it make sense to initialize filelength to -1 rather than 0 now that we're using -1 to be undefined?

          Let's use Long instead of long. Then, we can initialize it to null.

          > Style nit: there are missing spaces, eg "//file", "for(String", "for(file"

          I think we don't have a standard in our code for this kind of style. So I will leave them as-is.

          Show
          Tsz Wo Nicholas Sze added a comment - Eli, thanks for reviewing the patch! > Where are we setting chunked encoding on the read path? I only see that on create create path We don't have to do anything. The library handles it. > Wouldn't it make sense to initialize filelength to -1 rather than 0 now that we're using -1 to be undefined? Let's use Long instead of long. Then, we can initialize it to null. > Style nit: there are missing spaces, eg "//file", "for(String", "for(file" I think we don't have a standard in our code for this kind of style. So I will leave them as-is.
          Hide
          Eli Collins added a comment -

          Forgot to mention that I tested your patch and the 3gb distcp now works for me. And thanks for working on this Nicholas!

          Show
          Eli Collins added a comment - Forgot to mention that I tested your patch and the 3gb distcp now works for me. And thanks for working on this Nicholas!
          Hide
          Eli Collins added a comment -
          • Where are we setting chunked encoding on the read path? I only see that on create create path
          • Wouldn't it make sense to initialize filelength to -1 rather than 0 now that we're using -1 to be undefined?
          • Style nit: there are missing spaces, eg "//file", "for(String", "for(file"
          Show
          Eli Collins added a comment - Where are we setting chunked encoding on the read path? I only see that on create create path Wouldn't it make sense to initialize filelength to -1 rather than 0 now that we're using -1 to be undefined? Style nit: there are missing spaces, eg "//file", "for(String", "for(file"
          Hide
          Daryn Sharp added a comment -

          You're right Nicholas, both headers aren't supposed to be sent and chunked takes prio if they are.

          Pure suggestion, but maybe change the length comparison from filelength >= 0 to filelength != -1 or maybe better yet use a boolean. I don't particularly mind though.

          I saw that TestByteRangeInputStream#testByteRange uses a spied connection object. I didn't know if the same technique could be used. Unfortunately I have zero time to investigate...

          +1 It'd be a shame to have no test, but I'll defer to your discretion.

          Show
          Daryn Sharp added a comment - You're right Nicholas, both headers aren't supposed to be sent and chunked takes prio if they are. Pure suggestion, but maybe change the length comparison from filelength >= 0 to filelength != -1 or maybe better yet use a boolean. I don't particularly mind though. I saw that TestByteRangeInputStream#testByteRange uses a spied connection object. I didn't know if the same technique could be used. Unfortunately I have zero time to investigate... +1 It'd be a shame to have no test, but I'll defer to your discretion.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12541020/20120814NullEntity.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3012//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3012//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12541020/20120814NullEntity.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3012//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3012//console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          20120814NullEntity.patch:

          This is the mock test I mentioned. I used it to figure when the web server switch to chunked transfer encoding when the size >= 2GB - 1 but not 2GB. Unfortunately, the patch added some test code to DatanodeWebHdfsMethods. So it won't be committed.

          Show
          Tsz Wo Nicholas Sze added a comment - 20120814NullEntity.patch: This is the mock test I mentioned. I used it to figure when the web server switch to chunked transfer encoding when the size >= 2GB - 1 but not 2GB. Unfortunately, the patch added some test code to DatanodeWebHdfsMethods. So it won't be committed.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12540990/h3788_20120814b.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3009//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3009//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12540990/h3788_20120814b.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3009//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3009//console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h3788_20120814b.patch: fixes a NPE bug in previous patch.

          Show
          Tsz Wo Nicholas Sze added a comment - h3788_20120814b.patch: fixes a NPE bug in previous patch.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > I believe it's legitimate to send a content-length (if known) with a chunked response. ...

          I believe it is not. See below from http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4

          The Content-Length header field MUST NOT be sent if these two lengths are different (i.e., if a Transfer-Encoding header field is present)

          > I think a test case would be invaluable since the file size issue has reared itself a few times. Could you add a test that uses a mock?

          I did have a mock test but it requires changing DatanodeWebHdfsMethods. I don't see an easy way to have mock tests without changing the main code. Do you have any idea? If yes, could you add the tests?

          Show
          Tsz Wo Nicholas Sze added a comment - > I believe it's legitimate to send a content-length (if known) with a chunked response. ... I believe it is not. See below from http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4 The Content-Length header field MUST NOT be sent if these two lengths are different (i.e., if a Transfer-Encoding header field is present) > I think a test case would be invaluable since the file size issue has reared itself a few times. Could you add a test that uses a mock? I did have a mock test but it requires changing DatanodeWebHdfsMethods. I don't see an easy way to have mock tests without changing the main code. Do you have any idea? If yes, could you add the tests?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12540898/h3788_20120814.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.web.TestWebHDFS
          org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes
          org.apache.hadoop.hdfs.TestHftpFileSystem
          org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks
          org.apache.hadoop.hdfs.TestByteRangeInputStream
          org.apache.hadoop.hdfs.TestDFSShell
          org.apache.hadoop.hdfs.web.TestFSMainOperationsWebHdfs
          org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3005//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3005//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12540898/h3788_20120814.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHDFS org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes org.apache.hadoop.hdfs.TestHftpFileSystem org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks org.apache.hadoop.hdfs.TestByteRangeInputStream org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.web.TestFSMainOperationsWebHdfs org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3005//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3005//console This message is automatically generated.
          Hide
          Daryn Sharp added a comment -

          I believe it's legitimate to send a content-length (if known) with a chunked response. You may want to check for chunking only if there's not a content-length. It's up to you.

          I think a test case would be invaluable since the file size issue has reared itself a few times. Could you add a test that uses a mock?

          Show
          Daryn Sharp added a comment - I believe it's legitimate to send a content-length (if known) with a chunked response. You may want to check for chunking only if there's not a content-length. It's up to you. I think a test case would be invaluable since the file size issue has reared itself a few times. Could you add a test that uses a mock?
          Hide
          Jason Lowe added a comment -

          I tested out the patch on trunk and am unable to reproduce Eli's issue. Without the patch both -get and distcp via webhdfs fail, but after the patch I can successfully -get and distcp large files. This is on a pseudo-distributed tarball without security, distcp is hadoop distcp webhdfs://localhost:50070/user/someuser/distcpsrc hdfs://localhost:8020/user/someuser/distcpdest where distcpsrc/ contains a 3GB file.

          Show
          Jason Lowe added a comment - I tested out the patch on trunk and am unable to reproduce Eli's issue. Without the patch both -get and distcp via webhdfs fail, but after the patch I can successfully -get and distcp large files. This is on a pseudo-distributed tarball without security, distcp is hadoop distcp webhdfs://localhost:50070/user/someuser/distcpsrc hdfs://localhost:8020/user/someuser/distcpdest where distcpsrc/ contains a 3GB file.
          Hide
          Eli Collins added a comment -

          Do you mean "fs -get"? No way, they should have the same code path. Are you sure that both server and client were running trunk?

          Yes, hadoop fs -get of a 3gb file works but distcp of the directory containing that file fails. And yes, using a trunk build for everything, just running this via a pseudo distributed tarball install on my laptop.

          Can you explain what the bug is and the relevant fix? I don't see why we were not setting the content length header as we do that unconditionally on the server side.

          Show
          Eli Collins added a comment - Do you mean "fs -get"? No way, they should have the same code path. Are you sure that both server and client were running trunk? Yes, hadoop fs -get of a 3gb file works but distcp of the directory containing that file fails. And yes, using a trunk build for everything, just running this via a pseudo distributed tarball install on my laptop. Can you explain what the bug is and the relevant fix? I don't see why we were not setting the content length header as we do that unconditionally on the server side.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Thanks Daryn for taking a look. Here is a new patch: h3788_20120814.patch

          I have run 3GB file test included in the previous patch. The test will not be committed since it takes 10 minutes.

          Show
          Tsz Wo Nicholas Sze added a comment - Thanks Daryn for taking a look. Here is a new patch: h3788_20120814.patch I have run 3GB file test included in the previous patch. The test will not be committed since it takes 10 minutes.
          Hide
          Daryn Sharp added a comment -

          Nicholas: How about first check the transfer-encoding, if it is chunked, then no content-length check?

          Exactly. However, you need to update the patch to check both "Transfer-Encoding" and "TE" headers, and the headers may contain multiple comma separated values. I haven't tested, but I would expect Java's input stream for chunked responses to throw an EOF exception if the connection is broken so you might want to add a test for that.

          Eli: Note that a get of a 3gb file works but not distcp, what path is different?

          The code paths should be identical since it's the creation of the input stream that does the content-length check. I can't see how distcp could possibly work unless distcp is not using the filesystem class...

          Show
          Daryn Sharp added a comment - Nicholas: How about first check the transfer-encoding, if it is chunked, then no content-length check? Exactly. However, you need to update the patch to check both "Transfer-Encoding" and "TE" headers, and the headers may contain multiple comma separated values. I haven't tested, but I would expect Java's input stream for chunked responses to throw an EOF exception if the connection is broken so you might want to add a test for that. Eli: Note that a get of a 3gb file works but not distcp, what path is different? The code paths should be identical since it's the creation of the input stream that does the content-length check. I can't see how distcp could possibly work unless distcp is not using the filesystem class...
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12540829/h3788_20120813.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.TestFileAppend4

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2999//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2999//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12540829/h3788_20120813.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestFileAppend4 +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2999//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2999//console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Do you mean "fs -get"? No way, they should have the same code path. Are you sure that both server and client were running trunk?

          Show
          Tsz Wo Nicholas Sze added a comment - Do you mean "fs -get"? No way, they should have the same code path. Are you sure that both server and client were running trunk?
          Hide
          Eli Collins added a comment -

          Note that a get of a 3gb file works but not distcp, what path is different?

          Show
          Eli Collins added a comment - Note that a get of a 3gb file works but not distcp, what path is different?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h3788_20120813.patch: check content-length only for non-chunked transfer encoding.

          Show
          Tsz Wo Nicholas Sze added a comment - h3788_20120813.patch: check content-length only for non-chunked transfer encoding.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          How about first check the transfer-encoding, if it is chunked, then no content-length check?

          Show
          Tsz Wo Nicholas Sze added a comment - How about first check the transfer-encoding, if it is chunked, then no content-length check?
          Hide
          Daryn Sharp added a comment -

          I'll add that if you just remove the content-length check, and the response is not chunked, the http timeouts will abort the download.

          Show
          Daryn Sharp added a comment - I'll add that if you just remove the content-length check, and the response is not chunked, the http timeouts will abort the download.
          Hide
          Daryn Sharp added a comment -

          The problem is complex to support multiple grid versions:

          • you need either the content-length or chunking to reliably know when the file has been fully read
          • if the response isn't chunked, and there's no content-length, the client needs to obtain the content-length by other means such as a file stat

          Based on a quick glance, it looks like the current streaming servlet is explicitly setting the content-length to 0. (That seems wrong, because it's not an empty file) The puzzling part is I don't know how it works at all for either <2GB or >2GB! Java must be implicitly setting the content-length when the stream is <2GB.

          Show
          Daryn Sharp added a comment - The problem is complex to support multiple grid versions: you need either the content-length or chunking to reliably know when the file has been fully read if the response isn't chunked, and there's no content-length, the client needs to obtain the content-length by other means such as a file stat Based on a quick glance, it looks like the current streaming servlet is explicitly setting the content-length to 0. (That seems wrong, because it's not an empty file) The puzzling part is I don't know how it works at all for either <2GB or >2GB! Java must be implicitly setting the content-length when the stream is <2GB.
          Hide
          Jason Lowe added a comment -

          Oops, the -get issue is really related to HDFS-3671 and not this issue. Sorry for the spam.

          Show
          Jason Lowe added a comment - Oops, the -get issue is really related to HDFS-3671 and not this issue. Sorry for the spam.
          Hide
          Jason Lowe added a comment -

          A -get of a large file also fails, but it works on smaller files:

          $ hadoop fs -ls bigfile
          Found 1 items
          -rw-r--r--   3 someuser hdfs 3246391296 2012-08-13 15:04 bigfile
          $ hadoop fs -get webhdfs://clusternn:50070/user/someuser/bigfile
          get: Content-Length header is missing
          
          Show
          Jason Lowe added a comment - A -get of a large file also fails, but it works on smaller files: $ hadoop fs -ls bigfile Found 1 items -rw-r--r-- 3 someuser hdfs 3246391296 2012-08-13 15:04 bigfile $ hadoop fs -get webhdfs://clusternn:50070/user/someuser/bigfile get: Content-Length header is missing
          Hide
          Jason Lowe added a comment -

          This affects 0.23 as well.

          Show
          Jason Lowe added a comment - This affects 0.23 as well.
          Hide
          Eli Collins added a comment -

          Correct, this is a different issue from HDFS-3671.

          Show
          Eli Collins added a comment - Correct, this is a different issue from HDFS-3671 .
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Actually, this is slightly different since we may add Content-Length header in the server side. However, it won't work for some old version servers that do not have Content-Length.

          Show
          Tsz Wo Nicholas Sze added a comment - Actually, this is slightly different since we may add Content-Length header in the server side. However, it won't work for some old version servers that do not have Content-Length.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Is this the same as HDFS-3671?

          Show
          Tsz Wo Nicholas Sze added a comment - Is this the same as HDFS-3671 ?
          Hide
          Eli Collins added a comment -

          Full logs attached.

          Show
          Eli Collins added a comment - Full logs attached.

            People

            • Assignee:
              Tsz Wo Nicholas Sze
              Reporter:
              Eli Collins
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development