Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-755

Read multiple checksum chunks at once in DFSInputStream

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.21.0
    • Component/s: hdfs-client
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple checksum chunks in a single call to readChunk. This is the HDFS-side use of that new feature.

      1. hdfs-755.txt
        2 kB
        Todd Lipcon
      2. hdfs-755.txt
        2 kB
        Todd Lipcon
      3. hdfs-755.txt
        2 kB
        Todd Lipcon
      4. hdfs-755.txt
        3 kB
        Todd Lipcon
      5. hdfs-755.txt
        4 kB
        Todd Lipcon
      6. hdfs-755.txt
        4 kB
        Todd Lipcon
      7. benchmark-8-256.png
        38 kB
        Todd Lipcon
      8. benchmark.png
        43 kB
        Todd Lipcon
      9. alldata-hdfs.tsv
        251 kB
        Todd Lipcon

        Issue Links

          Activity

          Todd Lipcon created issue -
          Todd Lipcon made changes -
          Field Original Value New Value
          Link This issue is blocked by HADOOP-3205 [ HADOOP-3205 ]
          Hide
          Todd Lipcon added a comment -

          Here's a fairly small patch which uses the support for reading multiple checksum chunks from HADOOP-3205. I haven't run the full test suite yet, but got about halfway through and it seems to work - I'll be sure to put it through full testing before it gets committed. I'll also run this on a cluster and get TestDFSIO throughput numbers.

          Performance results look to be in line with what we see in HADOOP-3205.

          Benchmark setup:

          • I put a 700MB file on a psuedodistributed HDFS cluster.
          • I did 30 "fs -cat" of this file without the patch applied, and 30 with it applied. In both cases I did a couple cats first to make sure it was in the buffer cache. I can run another set of benchmarks that drops cache in between runs if people would like.
          • In both benchmark cases, the patch from HADOOP-3205 was applied. I used a 64K io.file.buffer.size for both the DN and the client.

          T-test results (alternative hypothesis = "with patch is faster")
          Wall clock time: p-value = 2.644e-07 -> 100% confidence. 95% confidence interval of 3.4% speedup
          User time: p-value = 1.638e-10 -> 100% confidence. 95% confidence interval of 3.9% speedup
          Sys time: p-value = 0.982 - that is to say above 95% confidence that we slowed down sys time. The confidence interval is about 0.7%

          The 95% confidence intervals in this benchmark are less impressive sounding than the ones in HADOOP-3205 because I used fewer samples.

          As to why the sys time slowed down, it's a bit of a mystery. My best guess is that, since we're now reading from the network sockets in larger chunks, we occasionally block in the kernel where we used to pretty much always read from a full buffer. But, this isn't too concerning - the wall clock time is what really matters.

          Show
          Todd Lipcon added a comment - Here's a fairly small patch which uses the support for reading multiple checksum chunks from HADOOP-3205 . I haven't run the full test suite yet, but got about halfway through and it seems to work - I'll be sure to put it through full testing before it gets committed. I'll also run this on a cluster and get TestDFSIO throughput numbers. Performance results look to be in line with what we see in HADOOP-3205 . Benchmark setup: I put a 700MB file on a psuedodistributed HDFS cluster. I did 30 "fs -cat" of this file without the patch applied, and 30 with it applied. In both cases I did a couple cats first to make sure it was in the buffer cache. I can run another set of benchmarks that drops cache in between runs if people would like. In both benchmark cases, the patch from HADOOP-3205 was applied. I used a 64K io.file.buffer.size for both the DN and the client. T-test results (alternative hypothesis = "with patch is faster") Wall clock time: p-value = 2.644e-07 -> 100% confidence. 95% confidence interval of 3.4% speedup User time: p-value = 1.638e-10 -> 100% confidence. 95% confidence interval of 3.9% speedup Sys time: p-value = 0.982 - that is to say above 95% confidence that we slowed down sys time. The confidence interval is about 0.7% The 95% confidence intervals in this benchmark are less impressive sounding than the ones in HADOOP-3205 because I used fewer samples. As to why the sys time slowed down, it's a bit of a mystery. My best guess is that, since we're now reading from the network sockets in larger chunks, we occasionally block in the kernel where we used to pretty much always read from a full buffer. But, this isn't too concerning - the wall clock time is what really matters.
          Todd Lipcon made changes -
          Attachment hdfs-755.txt [ 12424574 ]
          Hide
          Todd Lipcon added a comment -

          Here's an updated patch which fixes some behavior when running against an unpatched Common. If Common includes HADOOP-3205, it will be faster, and if it doesn't include HADOOP-3205, it should still work at the old speed.

          I also ran some more benchmarks over lunch, running "fs -cat bigfile bigfile bigfile ...20 times..." repeatedly with and without the patch. This differs from my previous benchmark in that each JVM runs for a good 40-50 seconds - enough time to fully JIT the code, etc. The patch is about a 3.4% speedup compared to trunk for these long reads as well (at 95% significance level).

          Show
          Todd Lipcon added a comment - Here's an updated patch which fixes some behavior when running against an unpatched Common. If Common includes HADOOP-3205 , it will be faster, and if it doesn't include HADOOP-3205 , it should still work at the old speed. I also ran some more benchmarks over lunch, running "fs -cat bigfile bigfile bigfile ...20 times..." repeatedly with and without the patch. This differs from my previous benchmark in that each JVM runs for a good 40-50 seconds - enough time to fully JIT the code, etc. The patch is about a 3.4% speedup compared to trunk for these long reads as well (at 95% significance level).
          Todd Lipcon made changes -
          Attachment hdfs-755.txt [ 12424669 ]
          Todd Lipcon made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12424669/hdfs-755.txt
          against trunk revision 835110.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The patch appears to cause tar ant target to fail.

          -1 findbugs. The patch appears to cause Findbugs to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/70/testReport/
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/70/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/70/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424669/hdfs-755.txt against trunk revision 835110. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/70/testReport/ Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/70/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/70/console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          Evidently I was compiling against a patched jar before (still getting used to the new mvn-based jar rigmarole)... uploading new patch momentarily to break the compile-time dependency on HADOOP-3205.

          Show
          Todd Lipcon added a comment - Evidently I was compiling against a patched jar before (still getting used to the new mvn-based jar rigmarole)... uploading new patch momentarily to break the compile-time dependency on HADOOP-3205 .
          Todd Lipcon made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Todd Lipcon made changes -
          Attachment hdfs-755.txt [ 12424688 ]
          Todd Lipcon made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12424688/hdfs-755.txt
          against trunk revision 835179.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/105/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/105/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/105/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/105/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424688/hdfs-755.txt against trunk revision 835179. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/105/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/105/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/105/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/105/console This message is automatically generated.
          Todd Lipcon made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Todd Lipcon added a comment -

          New patch fixes bug that caused simulated dataset tests to fail - those tests use no checksum, and I'd assumed checksumLength != 0 in the previous iteration.

          Show
          Todd Lipcon added a comment - New patch fixes bug that caused simulated dataset tests to fail - those tests use no checksum, and I'd assumed checksumLength != 0 in the previous iteration.
          Todd Lipcon made changes -
          Attachment hdfs-755.txt [ 12424749 ]
          Todd Lipcon made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12424749/hdfs-755.txt
          against trunk revision 835179.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/106/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/106/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/106/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/106/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424749/hdfs-755.txt against trunk revision 835179. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/106/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/106/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/106/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/106/console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          No additional tests included since this is an optimization to the existing read path, covered by all the existing HDFS tests that read files.

          Note: after HADOOP-3205 has been committed we should re-submit this patch to Hudson before committing this one to double-check the tests.

          Show
          Todd Lipcon added a comment - No additional tests included since this is an optimization to the existing read path, covered by all the existing HDFS tests that read files. Note: after HADOOP-3205 has been committed we should re-submit this patch to Hudson before committing this one to double-check the tests.
          Hide
          Eli Collins added a comment -

          Hey Todd – patch looks great. Did you test w/o checksums enabled?

          Show
          Eli Collins added a comment - Hey Todd – patch looks great. Did you test w/o checksums enabled?
          Todd Lipcon made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Todd Lipcon added a comment -

          Did you test w/o checksums enabled?

          TestDFSShell covers this case by corrupting a block and reading it with the -ignoreCrc option. Some other tests cover this too - I added a printout in DFSInputStream to fire when checksums were disabled and it did so. In investigating this I also noticed that TestFSInputChecker in hdfs was supposed to do this, but actually didn't (there was an unused "readCS" parameter). This new patch fixes that bug in the test.

          Show
          Todd Lipcon added a comment - Did you test w/o checksums enabled? TestDFSShell covers this case by corrupting a block and reading it with the -ignoreCrc option. Some other tests cover this too - I added a printout in DFSInputStream to fire when checksums were disabled and it did so. In investigating this I also noticed that TestFSInputChecker in hdfs was supposed to do this, but actually didn't (there was an unused "readCS" parameter). This new patch fixes that bug in the test.
          Todd Lipcon made changes -
          Attachment hdfs-755.txt [ 12426824 ]
          Todd Lipcon made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12426824/hdfs-755.txt
          against trunk revision 886322.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/131/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/131/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/131/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/131/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12426824/hdfs-755.txt against trunk revision 886322. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/131/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/131/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/131/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/131/console This message is automatically generated.
          Hide
          Eli Collins added a comment -

          Patch looks great. And nice catch with TestFSInputChecker. +1

          Show
          Eli Collins added a comment - Patch looks great. And nice catch with TestFSInputChecker. +1
          Hide
          Raghu Angadi added a comment -

          The patch looks good except that BlockReader.in seems to buffered with a large buffer. Most of the data might still be going through extra copy.

          I haven't run the patch yet...

          Show
          Raghu Angadi added a comment - The patch looks good except that BlockReader.in seems to buffered with a large buffer. Most of the data might still be going through extra copy. I haven't run the patch yet...
          Hide
          Todd Lipcon added a comment -

          I checked Raghu's idea and it's correct - since there's a buffer in BlockReader and it does some small reads at the beginning of the block (to get status code, etc) it "misaligns" the reads. I'm benchmarking various values for this buffer now with similar benchmarks as I ran in HADOOP-3205. (that JIRA should still be fine to commit)

          Show
          Todd Lipcon added a comment - I checked Raghu's idea and it's correct - since there's a buffer in BlockReader and it does some small reads at the beginning of the block (to get status code, etc) it "misaligns" the reads. I'm benchmarking various values for this buffer now with similar benchmarks as I ran in HADOOP-3205 . (that JIRA should still be fine to commit)
          Hide
          Todd Lipcon added a comment -

          Here's a graph showing benchmark results from an overnight run. The benchmark setup was:

          • Set the CHUNKS_PER_READ variable to a number of different values (seen on the bottom of the X axis - 1,2,4,etc)
          • Set the internal buffer of DFSClient (around line DFSClient.java:1618) to a number of different small values, plus 65536. 65536 was chosen as that was the io.file.buffer.size for this test, so it's what the current patch does.
          • For each combination of the above two variables, I ran 100 trials of catting 1GB off of a local datanode (pseudo-distributed HDFS). The machine had free RAM so the data stayed in buffer cache. I think this is appropriate for this benchmark as we're looking to see CPU improvement.
          • Each trial is graphed as a box plot. The top edge of the box is the 75th percentile. The bottom edge is the 25th percentile. The middle line is the median.

          Here's how I interpret the results:

          • As we saw in HADOOP-3205, there is no significant performance difference between setting CHECKSUMS_PER_READ (nee MAX_CHUNKS) to a value higher than 8 or 16. In that JIRA we decided to set it to 32. So I think that decision should stand. The second png (benchmark-8-256.png) zooms in on only the "good" settings of CHECKSUMS_PER_READ.
          • For any "good" setting, we see an advantage of setting the internal buffer small. This is due to Raghu's observation - it allows the reader to skip a buffer copy for the reads.

          The results above follow the intuition:
          When BlockReader is constructed, it reads the following things:

          • DataTransferProtocol.Status (2 bytes)
          • DataChecksum: byte + int (5 bytes)
          • firstChunkOffset (8 bytes)
            (total 15 bytes)

          Then for each packet:

          • packetLen (4 bytes)
          • offsetInBlock (8 bytes)
          • seq num (8 bytes)
          • lastPacket (1 byte)
          • datalen (4 bytes)
            (total 25 bytes)

          So a 16 byte internal buffer is big enough for the connection headers, but not big enough for the packet headers. The 512 byte internal buffer is big enough that there's going to be significant "overread" into the packet data itself, and significant copying will happen.

          I'm unsure whether setting the internal buffer low is "safe". That is to say, is it always wrapped by an external buffer, or are people depending on it for performance? If it's not always wrapped, this change will be slightly more complicated.

          What do people think about committing now as is (with the large internal buffer) and then working in a separate JIRA to eliminate the copy? As the graphs show, there is still a performance improvement (about 6.0sec to 5.6 sec here).

          Show
          Todd Lipcon added a comment - Here's a graph showing benchmark results from an overnight run. The benchmark setup was: Set the CHUNKS_PER_READ variable to a number of different values (seen on the bottom of the X axis - 1,2,4,etc) Set the internal buffer of DFSClient (around line DFSClient.java:1618) to a number of different small values, plus 65536. 65536 was chosen as that was the io.file.buffer.size for this test, so it's what the current patch does. For each combination of the above two variables, I ran 100 trials of catting 1GB off of a local datanode (pseudo-distributed HDFS). The machine had free RAM so the data stayed in buffer cache. I think this is appropriate for this benchmark as we're looking to see CPU improvement. Each trial is graphed as a box plot. The top edge of the box is the 75th percentile. The bottom edge is the 25th percentile. The middle line is the median. Here's how I interpret the results: As we saw in HADOOP-3205 , there is no significant performance difference between setting CHECKSUMS_PER_READ (nee MAX_CHUNKS) to a value higher than 8 or 16. In that JIRA we decided to set it to 32. So I think that decision should stand. The second png (benchmark-8-256.png) zooms in on only the "good" settings of CHECKSUMS_PER_READ. For any "good" setting, we see an advantage of setting the internal buffer small. This is due to Raghu's observation - it allows the reader to skip a buffer copy for the reads. The results above follow the intuition: When BlockReader is constructed, it reads the following things: DataTransferProtocol.Status (2 bytes) DataChecksum: byte + int (5 bytes) firstChunkOffset (8 bytes) (total 15 bytes) Then for each packet: packetLen (4 bytes) offsetInBlock (8 bytes) seq num (8 bytes) lastPacket (1 byte) datalen (4 bytes) (total 25 bytes) So a 16 byte internal buffer is big enough for the connection headers, but not big enough for the packet headers. The 512 byte internal buffer is big enough that there's going to be significant "overread" into the packet data itself, and significant copying will happen. I'm unsure whether setting the internal buffer low is "safe". That is to say, is it always wrapped by an external buffer, or are people depending on it for performance? If it's not always wrapped, this change will be slightly more complicated. What do people think about committing now as is (with the large internal buffer) and then working in a separate JIRA to eliminate the copy? As the graphs show, there is still a performance improvement (about 6.0sec to 5.6 sec here).
          Todd Lipcon made changes -
          Attachment benchmark.png [ 12427352 ]
          Attachment benchmark-8-256.png [ 12427353 ]
          Hide
          Raghu Angadi added a comment -

          Thanks for the benchmarks. What does this translate to for user cpu improvement (say with 32 byte buffer in DFSClient)?

          I think the internal buffer should be small for for this patch. It does not matter whether a user always wraps with another buffer or not... they essentially get performance inline with their read size. The bufferSize passed to FSInputChecker is essentially a hint.

          I need to look more into limit on CHUNKS_PER_READ. I don't see much of reason to limit it in FSInputChecker (within limits), if users invokes a read with large buffer, underlying FS (DFSClient in this case) should have access to that buffer...

          Show
          Raghu Angadi added a comment - Thanks for the benchmarks. What does this translate to for user cpu improvement (say with 32 byte buffer in DFSClient)? I think the internal buffer should be small for for this patch. It does not matter whether a user always wraps with another buffer or not... they essentially get performance inline with their read size. The bufferSize passed to FSInputChecker is essentially a hint. I need to look more into limit on CHUNKS_PER_READ. I don't see much of reason to limit it in FSInputChecker (within limits), if users invokes a read with large buffer, underlying FS (DFSClient in this case) should have access to that buffer...
          Hide
          Todd Lipcon added a comment -

          What does this translate to for user cpu improvement (say with 32 byte buffer in DFSClient)?

          Here's a table of median times for 1G cat. The "before" row is CHUNKS_PER_READ=1 (the pre-HADOOP-3205 behavior) and the internal buffer size 65536 (typical value). The "after" row is CHUNKS_PER_READ=32 and internal buffer 64 bytes.

            User Sys Wall
          Before 5.310 1.260 6.010
          After 4.935 1.235 5.350
          Improvement 7.06% 1.98% 10.98%

          The sys difference isn't significant according to a t-test. The user/wall are definitely significant (p < 2.2e-16). Changing around the internal buffer between 32, 50, 64 bytes didn't make any significant differences to any of the measurements.

          The bufferSize passed to FSInputChecker is essentially a hint

          My question is whether people are actually treating it like that in practice. For example, SequenceFile.Reader doesn't create its own BufferedInputStream to wrap fs.open. It just passes the user-specified buffer size through. If our own code isn't wrapping these things with a buffer, should we expect that user code is?

          Show
          Todd Lipcon added a comment - What does this translate to for user cpu improvement (say with 32 byte buffer in DFSClient)? Here's a table of median times for 1G cat. The "before" row is CHUNKS_PER_READ=1 (the pre- HADOOP-3205 behavior) and the internal buffer size 65536 (typical value). The "after" row is CHUNKS_PER_READ=32 and internal buffer 64 bytes.   User Sys Wall Before 5.310 1.260 6.010 After 4.935 1.235 5.350 Improvement 7.06% 1.98% 10.98% The sys difference isn't significant according to a t-test. The user/wall are definitely significant (p < 2.2e-16). Changing around the internal buffer between 32, 50, 64 bytes didn't make any significant differences to any of the measurements. The bufferSize passed to FSInputChecker is essentially a hint My question is whether people are actually treating it like that in practice. For example, SequenceFile.Reader doesn't create its own BufferedInputStream to wrap fs.open. It just passes the user-specified buffer size through. If our own code isn't wrapping these things with a buffer, should we expect that user code is?
          Hide
          Todd Lipcon added a comment -

          In case anyone wants to look at the raw data, uploading it. The columns are the various output format options from gnu "time", some more useful than others.

          Show
          Todd Lipcon added a comment - In case anyone wants to look at the raw data, uploading it. The columns are the various output format options from gnu "time", some more useful than others.
          Todd Lipcon made changes -
          Attachment alldata-hdfs.tsv [ 12427406 ]
          Hide
          Raghu Angadi added a comment -

          User code should use buffering for application specific reasons. May be 'bufferSize' argument for FSInputStream is flawed to start with.

          My impression is that main purpose of this patch is to reduce a copy. keeping the large buffer prohibits that.

          Even when a sequencefile has very small records (avg < 1k?), I think it might not have net negative effect. system calls are fairly cheap. There might not be a net negative effect on fairly small reads.

          Do you see FSInputChecker or DFSClient evolve to dynamically decide if a buffer should be used in near future?

          +1 for the patch itself.

          btw, I ran 'time bin/hadoop fs -cat 1gbfile > /dev/null', with NN, DN, and the client on the same machine, but not been able to see improvement. will verify if I am really running the patch.

          Show
          Raghu Angadi added a comment - User code should use buffering for application specific reasons. May be 'bufferSize' argument for FSInputStream is flawed to start with. My impression is that main purpose of this patch is to reduce a copy. keeping the large buffer prohibits that. Even when a sequencefile has very small records (avg < 1k?), I think it might not have net negative effect. system calls are fairly cheap. There might not be a net negative effect on fairly small reads. Do you see FSInputChecker or DFSClient evolve to dynamically decide if a buffer should be used in near future? +1 for the patch itself. btw, I ran 'time bin/hadoop fs -cat 1gbfile > /dev/null', with NN, DN, and the client on the same machine, but not been able to see improvement. will verify if I am really running the patch.
          Hide
          Todd Lipcon added a comment -

          User code should use buffering for application specific reasons. May be 'bufferSize' argument for FSInputStream is flawed to start with.

          Personally, I agree, but I think it's out of scope for this JIRA to fix that.

          My impression is that main purpose of this patch is to reduce a copy. keeping the large buffer prohibits that.

          That's true, but I think we need to thoroughly benchmark SequenceFile.Reader there, and do it in a separate JIRA. This one as it stands is not a breaking change, in that it should not reduce performance for any workload. Having a small internal buffer can potentially be breaking, so we should benchmark how big that break could be and weigh it vs the improvements.

          Aside from making a smaller internal buffer, there are a couple other options that might be less "dangerous" - eg using a small buffer for the initial reads, then creating a new BufferedInputStream with a fresh buffer to start the data reads. This would get rid of the "misalignment" issue here. ChecksumFileSystem has this same problem, so introducing our own BufferedInputStream implementation that has some tricks to re-align its reads against the buffer.

          Even when a sequencefile has very small records (avg < 1k?)

          I've seen SequenceFiles used for even smaller records - down to a few bytes (eg IntWritable keys and values). Syscalls are cheap but not that cheap compared to an 8-byte copy. So, I don't think we should do optimizatinos that would destroy performance of this scenario.

          ...but not been able to see improvement. will verify if I am really running the patch.

          Did you run this patch with a core jar that was compiled with HADOOP-3205? To test, you need to do "ant -Dresolvers=internal mvn-install" from Common, with HADOOP-3205 applied. Then, in the HDFS tree, "ant -Dresolvers=internal clean-cache binary" to make sure it pulls your local common build.

          Show
          Todd Lipcon added a comment - User code should use buffering for application specific reasons. May be 'bufferSize' argument for FSInputStream is flawed to start with. Personally, I agree, but I think it's out of scope for this JIRA to fix that. My impression is that main purpose of this patch is to reduce a copy. keeping the large buffer prohibits that. That's true, but I think we need to thoroughly benchmark SequenceFile.Reader there, and do it in a separate JIRA. This one as it stands is not a breaking change, in that it should not reduce performance for any workload. Having a small internal buffer can potentially be breaking, so we should benchmark how big that break could be and weigh it vs the improvements. Aside from making a smaller internal buffer, there are a couple other options that might be less "dangerous" - eg using a small buffer for the initial reads, then creating a new BufferedInputStream with a fresh buffer to start the data reads. This would get rid of the "misalignment" issue here. ChecksumFileSystem has this same problem, so introducing our own BufferedInputStream implementation that has some tricks to re-align its reads against the buffer. Even when a sequencefile has very small records (avg < 1k?) I've seen SequenceFiles used for even smaller records - down to a few bytes (eg IntWritable keys and values). Syscalls are cheap but not that cheap compared to an 8-byte copy. So, I don't think we should do optimizatinos that would destroy performance of this scenario. ...but not been able to see improvement. will verify if I am really running the patch. Did you run this patch with a core jar that was compiled with HADOOP-3205 ? To test, you need to do "ant -Dresolvers=internal mvn-install" from Common, with HADOOP-3205 applied. Then, in the HDFS tree, "ant -Dresolvers=internal clean-cache binary" to make sure it pulls your local common build.
          Hide
          Raghu Angadi added a comment -

          There is always a buffer of 512 bytes (checksum chunk size). So the worst case is 512 byte reads. If 512 is not large enough, we can decide on some size like 4k. This way large readers benefit from reduced copy and small readers pay a small penalty (1 syscall per 4k).

          The misalignment can occur even after the first packet. Another option is to have two buffers which which are read alternatively for crc and data (each time checking if other buffer has available data).

          > So, I don't think we should do optimizatinos that would destroy performance of this scenario.

          true. at the same time this is an optimization jira.

          I didn't get around to reproducing cpu improvement. I ran the commands you gave (in email). will try again today.

          I have already gave a +1 for the patch. We should just note that it needs more work to actually make use of HADOOP-3205.

          Show
          Raghu Angadi added a comment - There is always a buffer of 512 bytes (checksum chunk size). So the worst case is 512 byte reads. If 512 is not large enough, we can decide on some size like 4k. This way large readers benefit from reduced copy and small readers pay a small penalty (1 syscall per 4k). The misalignment can occur even after the first packet. Another option is to have two buffers which which are read alternatively for crc and data (each time checking if other buffer has available data). > So, I don't think we should do optimizatinos that would destroy performance of this scenario. true. at the same time this is an optimization jira. I didn't get around to reproducing cpu improvement. I ran the commands you gave (in email). will try again today. I have already gave a +1 for the patch. We should just note that it needs more work to actually make use of HADOOP-3205 .
          Todd Lipcon made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Todd Lipcon made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12427406/alldata-hdfs.tsv
          against trunk revision 895877.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/169/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427406/alldata-hdfs.tsv against trunk revision 895877. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/169/console This message is automatically generated.
          Todd Lipcon made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Todd Lipcon added a comment -

          Reattaching same patch so Hudson doesn't try to apply benchmark results as a patch.

          Show
          Todd Lipcon added a comment - Reattaching same patch so Hudson doesn't try to apply benchmark results as a patch.
          Todd Lipcon made changes -
          Attachment hdfs-755.txt [ 12429481 ]
          Todd Lipcon made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12429481/hdfs-755.txt
          against trunk revision 895877.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/170/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/170/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/170/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/170/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12429481/hdfs-755.txt against trunk revision 895877. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/170/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/170/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/170/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/170/console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          I think these failures are spurious - the same test passes locally:

              [junit] Running org.apache.hadoop.hdfs.TestDataTransferProtocol
              [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 5.678 sec
          
          Show
          Todd Lipcon added a comment - I think these failures are spurious - the same test passes locally: [junit] Running org.apache.hadoop.hdfs.TestDataTransferProtocol [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 5.678 sec
          Hide
          Todd Lipcon added a comment -

          Ran this patch through the full test suite on a local build machine, and the only failures were completely unrelated (and different than the ones that failed above). I believe this is ready to commit, since it was already +1ed above.

          Show
          Todd Lipcon added a comment - Ran this patch through the full test suite on a local build machine, and the only failures were completely unrelated (and different than the ones that failed above). I believe this is ready to commit, since it was already +1ed above.
          Hide
          Tom White added a comment -

          I've just committed this. Thanks Todd!

          Show
          Tom White added a comment - I've just committed this. Thanks Todd!
          Tom White made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Fix Version/s 0.22.0 [ 12314241 ]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #161 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/161/)
          . Read multiple checksum chunks at once in DFSInputStream. Contributed by Todd Lipcon.

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #161 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/161/ ) . Read multiple checksum chunks at once in DFSInputStream. Contributed by Todd Lipcon.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #195 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/195/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #195 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/195/ )
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #94 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/94/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #94 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/94/ )
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #183 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/183/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #183 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/183/ )
          Tom White made changes -
          Fix Version/s 0.21.0 [ 12314046 ]
          Fix Version/s 0.22.0 [ 12314241 ]
          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development