Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1081

Performance regression in DistributedFileSystem::getFileBlockLocations in secure systems

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: security
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      We've seen a significant decrease in the performance of DistributedFileSystem::getFileBlockLocations() with security turned on Y20. This JIRA is for correcting and tracking it both on Y20 and trunk.

      1. results.xlsx
        62 kB
        Jakob Homan
      2. HDFS-1081-trunk.patch
        15 kB
        Jakob Homan
      3. HDFS-1081-trunk.patch
        15 kB
        Jakob Homan
      4. bm1081.scala
        3 kB
        Jakob Homan
      5. ASF.LICENSE.NOT.GRANTED--HADOOP-1081-Y20-2.patch
        14 kB
        Jakob Homan
      6. ASF.LICENSE.NOT.GRANTED--HADOOP-1081-Y20-1.patch
        14 kB
        Jakob Homan

        Issue Links

          Activity

          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12449889/HDFS-1081-trunk.patch
          against trunk revision 965697.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/442/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/442/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/442/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/442/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12449889/HDFS-1081-trunk.patch against trunk revision 965697. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/442/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/442/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/442/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/442/console This message is automatically generated.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #347 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/347/)
          HDFS-1081. Performance regression in DistributedFileSystem::getFileBlockLocations in secure systems.

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #347 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/347/ ) HDFS-1081 . Performance regression in DistributedFileSystem::getFileBlockLocations in secure systems.
          Hide
          Jakob Homan added a comment -

          I've committed this. Resolving as fixed.

          Show
          Jakob Homan added a comment - I've committed this. Resolving as fixed.
          Hide
          Jakob Homan added a comment -

          Hudon's too slow. Manually ran tests. TestFileConcurrentReader and TestFiHFlush fail on trunk for me as well. TestFileAppend4, TestBlockToken and TestJspHelper all pass for me with the patch (this is known behavior Kan's looking into). I'm going to commit the patch.

          Show
          Jakob Homan added a comment - Hudon's too slow. Manually ran tests. TestFileConcurrentReader and TestFiHFlush fail on trunk for me as well. TestFileAppend4, TestBlockToken and TestJspHelper all pass for me with the patch (this is known behavior Kan's looking into). I'm going to commit the patch.
          Hide
          Jakob Homan added a comment -

          Re-submitting to Hudson. At this point it's nigh impossible to tell a valid test failure from a routine one...

          Show
          Jakob Homan added a comment - Re-submitting to Hudson. At this point it's nigh impossible to tell a valid test failure from a routine one...
          Hide
          Jakob Homan added a comment -

          canceling patching.

          Show
          Jakob Homan added a comment - canceling patching.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12449889/HDFS-1081-trunk.patch
          against trunk revision 965621.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/439/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/439/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/439/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/439/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12449889/HDFS-1081-trunk.patch against trunk revision 965621. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/439/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/439/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/439/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/439/console This message is automatically generated.
          Hide
          Owen O'Malley added a comment -

          +1

          Show
          Owen O'Malley added a comment - +1
          Hide
          Jakob Homan added a comment -

          Submitting patch.

          Show
          Jakob Homan added a comment - Submitting patch.
          Hide
          Jakob Homan added a comment -

          Correct patch. Previous was not sync'ed with trunk.

          Show
          Jakob Homan added a comment - Correct patch. Previous was not sync'ed with trunk.
          Hide
          Jakob Homan added a comment -

          Raw data of results.

          Show
          Jakob Homan added a comment - Raw data of results.
          Hide
          Jakob Homan added a comment -

          Script to benchmark this optimization.

          Show
          Jakob Homan added a comment - Script to benchmark this optimization.
          Hide
          Jakob Homan added a comment -

          Patch for trunk. Basically same as the 20S patch but with modifications for new BlockManager. We've been running the 20 patch in production and it's good.

          This optimization was benchmarked on a 5 DN cluster using the (to-be-attached) script to measure performance time with and without patch on trunk.
          Results:
          Round trip times for getBlockLocations call (in milliseconds) for files with specified number of blocks across 100 calls for each # of blocks.

          without patch

            1 blocks 2 blocks 3 blocks 4 blocks 5 blocks 6 blocks 7 blocks 8 blocks 9 blocks 10 blocks 20 blocks 50 blocks 100 blocks 250 blocks 500 blocks 1000 blocks
          mean 2.20 2.07 2.06 2.05 2.05 2.01 2.01 2.01 2.03 2.07 42.23 4.05 7.02 16.08 30.47 50.57
          median 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 42.00 4.00 7.00 16.00 28.00 49.50
          std dev 1.54 0.38 0.28 0.36 0.26 0.10 0.10 0.10 0.17 0.70 1.24 0.33 0.32 1.04 27.00 5.04

          With patch

            1 blocks 2 blocks 3 blocks 4 blocks 5 blocks 6 blocks 7 blocks 8 blocks 9 blocks 10 blocks 20 blocks 50 blocks 100 blocks 250 blocks 500 blocks 1000 blocks
          mean 1.15 1.01 1.02 1.09 1.00 1.01 1.00 1.01 1.00 1.01 40.76 2.00 3.97 11.61 25.61 88.02
          median 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 41.00 2.00 4.00 10.00 24.00 71.00
          std dev 1.22 0.10 0.14 0.90 0.00 0.10 0.00 0.10 0.00 0.10 0.67 0.00 1.33 8.16 6.38 115.07

          raw difference: how much less time it took with the patch (negative numbers are better)

            1 blocks 2 blocks 3 blocks 4 blocks 5 blocks 6 blocks 7 blocks 8 blocks 9 blocks 10 blocks 20 blocks 50 blocks 100 blocks 250 blocks 500 blocks 1000 blocks
          mean -1.05 -1.06 -1.04 -0.96 -1.05 -1.00 -1.01 -1.00 -1.03 -1.06 -1.47 -2.05 -3.05 -4.47 -4.86 37.45
          median -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -2.00 -3.00 -6.00 -4.00 21.50
          std dev -0.33 -0.28 -0.14 0.54 -0.26 0.00 -0.10 0.00 -0.17 -0.60 -0.57 -0.33 1.01 7.12 -20.62 110.03

          % difference: Amount of time the patched call took compared to the unpatched time

            1 blocks 2 blocks 3 blocks 4 blocks 5 blocks 6 blocks 7 blocks 8 blocks 9 blocks 10 blocks 20 blocks 50 blocks 100 blocks 250 blocks 500 blocks 1000 blocks
          mean 0.52 0.49 0.50 0.53 0.49 0.50 0.50 0.50 0.49 0.49 0.97 0.49 0.57 0.72 0.84 1.74
          median 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.98 0.50 0.57 0.63 0.86 1.43
          std dev 0.79 0.26 0.51 2.51 0.00 1.00 0.00 1.00 0.00 0.14 0.54 0.00 4.19 7.83 0.24 22.84

          For files with 1-100 blocks we cut the time in half.

          At 20 blocks I see a big spike in the amount of time to do the processing, but this is in both the patched and unpatched versions. I'm not sure what's causing this; it warrants looking into.

          This patch saves a lot of time on the NN CPU by only doing the big calculation once, but currently could do better at network usage. This starts to show up with 250+ blocks, where we're sending a bigger and bigger amount of data and this overwhelms (eventually) the CPU savings. 250+ blocks for a single file in HDFS is exceedingly rare, and can also be improved, and I'll open another JIRA to optimize this.

          I think the data support this particular optimization. Patch is ready for review.

          Show
          Jakob Homan added a comment - Patch for trunk. Basically same as the 20S patch but with modifications for new BlockManager. We've been running the 20 patch in production and it's good. This optimization was benchmarked on a 5 DN cluster using the (to-be-attached) script to measure performance time with and without patch on trunk. Results: Round trip times for getBlockLocations call (in milliseconds) for files with specified number of blocks across 100 calls for each # of blocks. without patch   1 blocks 2 blocks 3 blocks 4 blocks 5 blocks 6 blocks 7 blocks 8 blocks 9 blocks 10 blocks 20 blocks 50 blocks 100 blocks 250 blocks 500 blocks 1000 blocks mean 2.20 2.07 2.06 2.05 2.05 2.01 2.01 2.01 2.03 2.07 42.23 4.05 7.02 16.08 30.47 50.57 median 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 42.00 4.00 7.00 16.00 28.00 49.50 std dev 1.54 0.38 0.28 0.36 0.26 0.10 0.10 0.10 0.17 0.70 1.24 0.33 0.32 1.04 27.00 5.04 With patch   1 blocks 2 blocks 3 blocks 4 blocks 5 blocks 6 blocks 7 blocks 8 blocks 9 blocks 10 blocks 20 blocks 50 blocks 100 blocks 250 blocks 500 blocks 1000 blocks mean 1.15 1.01 1.02 1.09 1.00 1.01 1.00 1.01 1.00 1.01 40.76 2.00 3.97 11.61 25.61 88.02 median 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 41.00 2.00 4.00 10.00 24.00 71.00 std dev 1.22 0.10 0.14 0.90 0.00 0.10 0.00 0.10 0.00 0.10 0.67 0.00 1.33 8.16 6.38 115.07 raw difference: how much less time it took with the patch (negative numbers are better)   1 blocks 2 blocks 3 blocks 4 blocks 5 blocks 6 blocks 7 blocks 8 blocks 9 blocks 10 blocks 20 blocks 50 blocks 100 blocks 250 blocks 500 blocks 1000 blocks mean -1.05 -1.06 -1.04 -0.96 -1.05 -1.00 -1.01 -1.00 -1.03 -1.06 -1.47 -2.05 -3.05 -4.47 -4.86 37.45 median -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -2.00 -3.00 -6.00 -4.00 21.50 std dev -0.33 -0.28 -0.14 0.54 -0.26 0.00 -0.10 0.00 -0.17 -0.60 -0.57 -0.33 1.01 7.12 -20.62 110.03 % difference: Amount of time the patched call took compared to the unpatched time   1 blocks 2 blocks 3 blocks 4 blocks 5 blocks 6 blocks 7 blocks 8 blocks 9 blocks 10 blocks 20 blocks 50 blocks 100 blocks 250 blocks 500 blocks 1000 blocks mean 0.52 0.49 0.50 0.53 0.49 0.50 0.50 0.50 0.49 0.49 0.97 0.49 0.57 0.72 0.84 1.74 median 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.98 0.50 0.57 0.63 0.86 1.43 std dev 0.79 0.26 0.51 2.51 0.00 1.00 0.00 1.00 0.00 0.14 0.54 0.00 4.19 7.83 0.24 22.84 For files with 1-100 blocks we cut the time in half. At 20 blocks I see a big spike in the amount of time to do the processing, but this is in both the patched and unpatched versions. I'm not sure what's causing this; it warrants looking into. This patch saves a lot of time on the NN CPU by only doing the big calculation once, but currently could do better at network usage. This starts to show up with 250+ blocks, where we're sending a bigger and bigger amount of data and this overwhelms (eventually) the CPU savings. 250+ blocks for a single file in HDFS is exceedingly rare, and can also be improved, and I'll open another JIRA to optimize this. I think the data support this particular optimization. Patch is ready for review.
          Hide
          Jakob Homan added a comment -

          Updated with correct protocol bumps. Per Konstantin, the protocol should be bumped to 1+ current trunk, to avoid collisions with previously used values. Trunk version of protocols will be bumped as well, as soon as trunk patch is ready...

          Show
          Jakob Homan added a comment - Updated with correct protocol bumps. Per Konstantin, the protocol should be bumped to 1+ current trunk, to avoid collisions with previously used values. Trunk version of protocols will be bumped as well, as soon as trunk patch is ready...
          Hide
          Jitendra Nath Pandey added a comment -

          +1. minor nit, probably a typo, the DatanodeProtocol version is bumped by 2.

          Show
          Jitendra Nath Pandey added a comment - +1. minor nit, probably a typo, the DatanodeProtocol version is bumped by 2.
          Hide
          Jakob Homan added a comment -

          Patch for review.

          In our Y20S benchmarking, we saw dramatic increase for the getBlockLocations, due to two operations introduced by block access tokens:

          TokenIdentifier.getBytes() is expensive and is called twice
          In getFileBlockLocations TokenIdentifier.getBytes() is called twice in rapid succession:

            // Token.java:50
            public Token(T id, SecretManager<T> mgr) {
              password = mgr.createPassword(id); // Calls id.getBytes()
              identifier = id.getBytes();                   // and here
          

          This call is relatively expensive, as the BlockTokenIdentifier is serialized to a new DataOutPutBuffer and copied to a new array each time . This patch caches the results of the getBytes() call and returns that, assuming no mutation to the token state.

          For n blocks in a getBlockLocations() call, n block access tokens are created and each is relatively expensive
          In a call to getBlockLocations(), for every block that is returned, a new Token<BlockTokenIdentifier> is created and attached to the block. Each new Token<BlockTokenIdentifier> means a call to hmac.DoFinal on the BTI's bytes. This call to the hmac calculation, which generates the token's password, turns out to be relatively expensive and was dramatically slowing down the function, particularly for files with large numbers of blocks.

          This patch updates BlockTokenIdentifiers to be valid for a collection of blockIds rather than a single blockid. This allows us to generate a single Token<BlockTokenIdentifier> for every call to getBlockLocations, calling the hmac function only once. A quick benchmark of hmac.doFinal shows that its processing time is pretty much constant even for large byte arrays (by our standards for these tokens), meaning with this optimization, our time in hmac for n blocks should be constant. This is a pretty surgical change and does not require much change to other parts of the Token authentication and authorization code. For files with a small number of blocks there should be no penalty in performance.

          Show
          Jakob Homan added a comment - Patch for review. In our Y20S benchmarking, we saw dramatic increase for the getBlockLocations, due to two operations introduced by block access tokens: TokenIdentifier.getBytes() is expensive and is called twice In getFileBlockLocations TokenIdentifier.getBytes() is called twice in rapid succession: // Token.java:50 public Token(T id, SecretManager<T> mgr) { password = mgr.createPassword(id); // Calls id.getBytes() identifier = id.getBytes(); // and here This call is relatively expensive, as the BlockTokenIdentifier is serialized to a new DataOutPutBuffer and copied to a new array each time . This patch caches the results of the getBytes() call and returns that, assuming no mutation to the token state. For n blocks in a getBlockLocations() call, n block access tokens are created and each is relatively expensive In a call to getBlockLocations(), for every block that is returned, a new Token<BlockTokenIdentifier> is created and attached to the block. Each new Token<BlockTokenIdentifier> means a call to hmac.DoFinal on the BTI's bytes. This call to the hmac calculation, which generates the token's password, turns out to be relatively expensive and was dramatically slowing down the function, particularly for files with large numbers of blocks. This patch updates BlockTokenIdentifiers to be valid for a collection of blockIds rather than a single blockid. This allows us to generate a single Token<BlockTokenIdentifier> for every call to getBlockLocations, calling the hmac function only once. A quick benchmark of hmac.doFinal shows that its processing time is pretty much constant even for large byte arrays (by our standards for these tokens), meaning with this optimization, our time in hmac for n blocks should be constant. This is a pretty surgical change and does not require much change to other parts of the Token authentication and authorization code. For files with a small number of blocks there should be no penalty in performance.

            People

            • Assignee:
              Jakob Homan
              Reporter:
              Jakob Homan
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development