Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-4551

Change WebHDFS buffersize behavior to improve default performance

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.2
    • Fix Version/s: 1.2.0
    • Component/s: webhdfs
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently on 1.X branch, the buffer size used to copy bytes to network defaults to io.file.buffer.size. This causes performance problems if that buffersize is large.

      1. HDFS-4551.1.patch
        0.8 kB
        Mark Wagner

        Activity

        Hide
        Mark Wagner added a comment -

        I've attached a patch which hardcodes the copy buffer size to 4096. This matches the behavior of hftp and WebHDFS on trunk.

        Show
        Mark Wagner added a comment - I've attached a patch which hardcodes the copy buffer size to 4096. This matches the behavior of hftp and WebHDFS on trunk.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12572019/HDFS-4551.1.patch
        against trunk revision .

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4038//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12572019/HDFS-4551.1.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4038//console This message is automatically generated.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Hi Mark,

        Currently, WebHDFS on branch-1 first uses user parameter, then the server config value and then the default value (4096) for buffer size. WebHDFS on trunk also does the same (i.e. (1) user parameter, (2) server conf, (3) default 4096.) For Hftp, it first uses server config value and then the default value.

        Are you sure that changing the buffer size to 4096 helps the performance? Have you done any benchmark?

        Show
        Tsz Wo Nicholas Sze added a comment - Hi Mark, Currently, WebHDFS on branch-1 first uses user parameter, then the server config value and then the default value (4096) for buffer size. WebHDFS on trunk also does the same (i.e. (1) user parameter, (2) server conf, (3) default 4096.) For Hftp, it first uses server config value and then the default value. Are you sure that changing the buffer size to 4096 helps the performance? Have you done any benchmark?
        Hide
        Mark Wagner added a comment -

        Hi Nicholas,

        I have observed significant performance increase (7-8x) at copying a 1GB file when the server has set io.file.buffer.size at 64kB (and the buffer size isn't specified in the request). Of course you can manually set the buffer size to 4096 bytes, but then that affects the buffer size used to open the file also.

        I think we may have gotten crossed up about what is changing. This patch only changes the buffer size to copy from the FileInputStream onto network. My understanding is that both WebHDFS and hftp on trunk eventually end up at:

        IOUtils.java
        126 public static void copyBytes(InputStream in, OutputStream out, long count,
        127      boolean close) throws IOException {
        128    byte buf[] = new byte[4096];
        129    long bytesRemaining = count;
        

        which is what this patch is trying to match. Is that your understanding also? There's an argument to be made that this should be configurable, but I figured it best to copy what trunk does.

        Show
        Mark Wagner added a comment - Hi Nicholas, I have observed significant performance increase (7-8x) at copying a 1GB file when the server has set io.file.buffer.size at 64kB (and the buffer size isn't specified in the request). Of course you can manually set the buffer size to 4096 bytes, but then that affects the buffer size used to open the file also. I think we may have gotten crossed up about what is changing. This patch only changes the buffer size to copy from the FileInputStream onto network. My understanding is that both WebHDFS and hftp on trunk eventually end up at: IOUtils.java 126 public static void copyBytes(InputStream in, OutputStream out, long count, 127 boolean close) throws IOException { 128 byte buf[] = new byte [4096]; 129 long bytesRemaining = count; which is what this patch is trying to match. Is that your understanding also? There's an argument to be made that this should be configurable, but I figured it best to copy what trunk does.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        I see. We probably should do the same in branch-1.

        > ... when the server has set io.file.buffer.size at 64kB ...

        Do you mean 4KB? Or should we actually set the buffer size to 64KB?

        Show
        Tsz Wo Nicholas Sze added a comment - I see. We probably should do the same in branch-1. > ... when the server has set io.file.buffer.size at 64kB ... Do you mean 4KB? Or should we actually set the buffer size to 64KB?
        Hide
        Mark Wagner added a comment -

        I'm not sure what's best, but io.file.buffer.size=64KB is the setting that I first noticed this on. Although it defaults to 4KB, everything I've seen recommended values from 16KB to 128KB, so I don't think 64KB is an unusual choice.

        Show
        Mark Wagner added a comment - I'm not sure what's best, but io.file.buffer.size=64KB is the setting that I first noticed this on. Although it defaults to 4KB, everything I've seen recommended values from 16KB to 128KB, so I don't think 64KB is an unusual choice.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Then, how is the performance of 4KB? Is it similar to 64KB? If not, we probably should increase the value. Could you check it? Thank you in advance!

        Show
        Tsz Wo Nicholas Sze added a comment - Then, how is the performance of 4KB? Is it similar to 64KB? If not, we probably should increase the value. Could you check it? Thank you in advance!
        Hide
        Mark Wagner added a comment -

        I haven't done any real testing for different values of io.file.buffer.size, but it seemed that 4KB and 64KB had similar performance (at least when pulling over WebHDFS). They may be different under load, though. I can look at this more, but I think determining a good value for that parameter is outside the scope of this JIRA. Would you agree?

        Show
        Mark Wagner added a comment - I haven't done any real testing for different values of io.file.buffer.size, but it seemed that 4KB and 64KB had similar performance (at least when pulling over WebHDFS). They may be different under load, though. I can look at this more, but I think determining a good value for that parameter is outside the scope of this JIRA. Would you agree?
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Sure, please feel free to file another JIRA if you find any further improvement. Thanks a lot for the good works!

        +1 patch looks good.

        Show
        Tsz Wo Nicholas Sze added a comment - Sure, please feel free to file another JIRA if you find any further improvement. Thanks a lot for the good works! +1 patch looks good.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        I have committed this. Thanks, Mark!

        Show
        Tsz Wo Nicholas Sze added a comment - I have committed this. Thanks, Mark!
        Hide
        Matt Foley added a comment -

        Closed upon release of Hadoop 1.2.0.

        Show
        Matt Foley added a comment - Closed upon release of Hadoop 1.2.0.

          People

          • Assignee:
            Mark Wagner
            Reporter:
            Mark Wagner
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development