Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-4551

Change WebHDFS buffersize behavior to improve default performance

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.2
    • Fix Version/s: 1.2.0
    • Component/s: webhdfs
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently on 1.X branch, the buffer size used to copy bytes to network defaults to io.file.buffer.size. This causes performance problems if that buffersize is large.

      1. HDFS-4551.1.patch
        0.8 kB
        Mark Wagner

        Issue Links

          Activity

          Hide
          Matt Foley added a comment -

          Closed upon release of Hadoop 1.2.0.

          Show
          Matt Foley added a comment - Closed upon release of Hadoop 1.2.0.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I have committed this. Thanks, Mark!

          Show
          Tsz Wo Nicholas Sze added a comment - I have committed this. Thanks, Mark!
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Sure, please feel free to file another JIRA if you find any further improvement. Thanks a lot for the good works!

          +1 patch looks good.

          Show
          Tsz Wo Nicholas Sze added a comment - Sure, please feel free to file another JIRA if you find any further improvement. Thanks a lot for the good works! +1 patch looks good.
          Hide
          Mark Wagner added a comment -

          I haven't done any real testing for different values of io.file.buffer.size, but it seemed that 4KB and 64KB had similar performance (at least when pulling over WebHDFS). They may be different under load, though. I can look at this more, but I think determining a good value for that parameter is outside the scope of this JIRA. Would you agree?

          Show
          Mark Wagner added a comment - I haven't done any real testing for different values of io.file.buffer.size, but it seemed that 4KB and 64KB had similar performance (at least when pulling over WebHDFS). They may be different under load, though. I can look at this more, but I think determining a good value for that parameter is outside the scope of this JIRA. Would you agree?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Then, how is the performance of 4KB? Is it similar to 64KB? If not, we probably should increase the value. Could you check it? Thank you in advance!

          Show
          Tsz Wo Nicholas Sze added a comment - Then, how is the performance of 4KB? Is it similar to 64KB? If not, we probably should increase the value. Could you check it? Thank you in advance!
          Hide
          Mark Wagner added a comment -

          I'm not sure what's best, but io.file.buffer.size=64KB is the setting that I first noticed this on. Although it defaults to 4KB, everything I've seen recommended values from 16KB to 128KB, so I don't think 64KB is an unusual choice.

          Show
          Mark Wagner added a comment - I'm not sure what's best, but io.file.buffer.size=64KB is the setting that I first noticed this on. Although it defaults to 4KB, everything I've seen recommended values from 16KB to 128KB, so I don't think 64KB is an unusual choice.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I see. We probably should do the same in branch-1.

          > ... when the server has set io.file.buffer.size at 64kB ...

          Do you mean 4KB? Or should we actually set the buffer size to 64KB?

          Show
          Tsz Wo Nicholas Sze added a comment - I see. We probably should do the same in branch-1. > ... when the server has set io.file.buffer.size at 64kB ... Do you mean 4KB? Or should we actually set the buffer size to 64KB?
          Hide
          Mark Wagner added a comment -

          Hi Nicholas,

          I have observed significant performance increase (7-8x) at copying a 1GB file when the server has set io.file.buffer.size at 64kB (and the buffer size isn't specified in the request). Of course you can manually set the buffer size to 4096 bytes, but then that affects the buffer size used to open the file also.

          I think we may have gotten crossed up about what is changing. This patch only changes the buffer size to copy from the FileInputStream onto network. My understanding is that both WebHDFS and hftp on trunk eventually end up at:

          IOUtils.java
          126 public static void copyBytes(InputStream in, OutputStream out, long count,
          127      boolean close) throws IOException {
          128    byte buf[] = new byte[4096];
          129    long bytesRemaining = count;
          

          which is what this patch is trying to match. Is that your understanding also? There's an argument to be made that this should be configurable, but I figured it best to copy what trunk does.

          Show
          Mark Wagner added a comment - Hi Nicholas, I have observed significant performance increase (7-8x) at copying a 1GB file when the server has set io.file.buffer.size at 64kB (and the buffer size isn't specified in the request). Of course you can manually set the buffer size to 4096 bytes, but then that affects the buffer size used to open the file also. I think we may have gotten crossed up about what is changing. This patch only changes the buffer size to copy from the FileInputStream onto network. My understanding is that both WebHDFS and hftp on trunk eventually end up at: IOUtils.java 126 public static void copyBytes(InputStream in, OutputStream out, long count, 127 boolean close) throws IOException { 128 byte buf[] = new byte [4096]; 129 long bytesRemaining = count; which is what this patch is trying to match. Is that your understanding also? There's an argument to be made that this should be configurable, but I figured it best to copy what trunk does.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hi Mark,

          Currently, WebHDFS on branch-1 first uses user parameter, then the server config value and then the default value (4096) for buffer size. WebHDFS on trunk also does the same (i.e. (1) user parameter, (2) server conf, (3) default 4096.) For Hftp, it first uses server config value and then the default value.

          Are you sure that changing the buffer size to 4096 helps the performance? Have you done any benchmark?

          Show
          Tsz Wo Nicholas Sze added a comment - Hi Mark, Currently, WebHDFS on branch-1 first uses user parameter, then the server config value and then the default value (4096) for buffer size. WebHDFS on trunk also does the same (i.e. (1) user parameter, (2) server conf, (3) default 4096.) For Hftp, it first uses server config value and then the default value. Are you sure that changing the buffer size to 4096 helps the performance? Have you done any benchmark?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12572019/HDFS-4551.1.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4038//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12572019/HDFS-4551.1.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4038//console This message is automatically generated.
          Hide
          Mark Wagner added a comment -

          I've attached a patch which hardcodes the copy buffer size to 4096. This matches the behavior of hftp and WebHDFS on trunk.

          Show
          Mark Wagner added a comment - I've attached a patch which hardcodes the copy buffer size to 4096. This matches the behavior of hftp and WebHDFS on trunk.

            People

            • Assignee:
              Mark Wagner
              Reporter:
              Mark Wagner
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development