Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1043

Benchmark overhead of server-side group resolution of users

    Details

    • Type: Test Test
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.21.0
    • Component/s: benchmarks
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Server-side user group resolution was introduced in HADOOP-4656.
      The benchmark should repeatedly request the name-node for user group resolution, and reset NN's user group cache periodically.

      1. UGCRefresh.patch
        7 kB
        Konstantin Shvachko
      2. UGCRefresh.patch
        7 kB
        Konstantin Shvachko

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #275 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/275/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #275 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/275/ )
        Hide
        Hudson added a comment -

        Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #302 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/302/)

        Show
        Hudson added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #302 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/302/ )
        Hide
        Hudson added a comment -

        Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #146 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/146/)

        Show
        Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #146 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/146/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #220 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/220/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #220 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/220/ )
        Hide
        Konstantin Shvachko added a comment -

        I just committed this.

        Show
        Konstantin Shvachko added a comment - I just committed this.
        Hide
        Suresh Srinivas added a comment -

        +1 for the patch. I am surprised that ConcurrentHashMap used for caching groups affects the performance when using 100 threads. We should perhaps create a bug to track this optimization.

        Show
        Suresh Srinivas added a comment - +1 for the patch. I am surprised that ConcurrentHashMap used for caching groups affects the performance when using 100 threads. We should perhaps create a bug to track this optimization.
        Hide
        Konstantin Shvachko added a comment -

        A minor correction to JavaDoc, and a merge with current trunk.

        Show
        Konstantin Shvachko added a comment - A minor correction to JavaDoc, and a merge with current trunk.
        Hide
        Konstantin Shvachko added a comment -

        I run NNThroughputBenchmark -op open. This opens a lot of files (100,000 - 500,000) on the name-node. The name-node performs server-side use group resolution. In version 0.20.1 we used to pass the user group(s) along with the user name. The security branch (and trunk) use server-side UG resolution instead. In regular case for 0.20.100 most of resolutions will be done from the server-side cache. The actual unix shell group resolution will happen only if the entry is not cached or the cache expired.
        I run the benchmark in two variants in the first the cache is never refreshed, so user groups always come from the cache. In the second variant, clients frequently send requests to refresh cache, so the server actually resolves groups most of the time.
        I also ran the benchmark with different number of threads (server handlers). The one-threaded (sequential) variant measures the actual overhead of server-side UG resoltion. The 100-thread variant is closer to what is used in real clusters.
        The table below summarizes the results. The number units here are operations-per-second.

        • UG cache resolution adds about 8% overhead per operation
        • direct UG resolutions adds 34%. This should not happen often, and
        • in the (real) concurrent world this only results in 8% overhead.
        • An unexpected result is that cache turns out to be inefficient when accessed concurrently. I verified this many times, the numbers vary, but getting cached values is always slower than direct resolution. This is not expected, and should be address in future optimizations.
        Version 1 thread (ops/sec) 100 threads (ops/sec)
        0.20.1 no server-side UG resolution 48638 67676
        0.20.100 use UG cache 44581 (-8%) 53418 (-18%)
        0.20.100 direct UG resolution 31869 (-34%) 62500 (-8%)
        Show
        Konstantin Shvachko added a comment - I run NNThroughputBenchmark -op open . This opens a lot of files (100,000 - 500,000) on the name-node. The name-node performs server-side use group resolution. In version 0.20.1 we used to pass the user group(s) along with the user name. The security branch (and trunk) use server-side UG resolution instead. In regular case for 0.20.100 most of resolutions will be done from the server-side cache. The actual unix shell group resolution will happen only if the entry is not cached or the cache expired. I run the benchmark in two variants in the first the cache is never refreshed, so user groups always come from the cache. In the second variant, clients frequently send requests to refresh cache, so the server actually resolves groups most of the time. I also ran the benchmark with different number of threads (server handlers). The one-threaded (sequential) variant measures the actual overhead of server-side UG resoltion. The 100-thread variant is closer to what is used in real clusters. The table below summarizes the results. The number units here are operations-per-second. UG cache resolution adds about 8% overhead per operation direct UG resolutions adds 34%. This should not happen often, and in the (real) concurrent world this only results in 8% overhead. An unexpected result is that cache turns out to be inefficient when accessed concurrently. I verified this many times, the numbers vary, but getting cached values is always slower than direct resolution. This is not expected, and should be address in future optimizations. Version 1 thread (ops/sec) 100 threads (ops/sec) 0.20.1 no server-side UG resolution 48638 67676 0.20.100 use UG cache 44581 (-8%) 53418 (-18%) 0.20.100 direct UG resolution 31869 (-34%) 62500 (-8%)
        Hide
        Konstantin Shvachko added a comment -

        Sorry please disregard this comment it is intended for HADOOP-6637. I'll post these benchmark results later.

        Show
        Konstantin Shvachko added a comment - Sorry please disregard this comment it is intended for HADOOP-6637 . I'll post these benchmark results later.
        Hide
        Konstantin Shvachko added a comment -

        I ran the benchmark on three versions of hadoop

        1. 0.20.1, which does not have any security code, and therefore kerberos and delegation token authentications are not applicable there.
        2. 0.20.100, which contains the latest state of security implementation
        3. 0.22.trunk, which does not have all the latest security patches applied at the time of benchmarking (just for the reference)

        The benchmark creates a connection to the RPC server 1000 times. Each time the RPC server authenticates the client using one of the three authentication methods (no authentication, kerberos, delegation token). The result if the average latency of the connection request.

        The table below shows that

        • when security is turned off the the new code still adds 14% overhead.
        • The overhead for kerberos authentication is predictably huge.
        • The delegation token authentication was intended as a fast alternative to kerberos. It is somewhat faster, but not as nearly as the non-secure version. This should definitely be the focus of future optimizations.
        • 0.22 is 1-2% slower compared to 0.20.100. It is expected to catch up with it, when all latest security contributions are ported to the trunk.
        Version No security Kerberos Delegation Tooken
        0.20.1 0.920    
        0.20.100 1.047 (+14%) 44.670 42.615
        0.22 1.597 (+73%) 45.148 43.455
        Show
        Konstantin Shvachko added a comment - I ran the benchmark on three versions of hadoop 0.20.1, which does not have any security code, and therefore kerberos and delegation token authentications are not applicable there. 0.20.100, which contains the latest state of security implementation 0.22.trunk, which does not have all the latest security patches applied at the time of benchmarking (just for the reference) The benchmark creates a connection to the RPC server 1000 times. Each time the RPC server authenticates the client using one of the three authentication methods (no authentication, kerberos, delegation token). The result if the average latency of the connection request. The table below shows that when security is turned off the the new code still adds 14% overhead. The overhead for kerberos authentication is predictably huge. The delegation token authentication was intended as a fast alternative to kerberos. It is somewhat faster, but not as nearly as the non-secure version. This should definitely be the focus of future optimizations. 0.22 is 1-2% slower compared to 0.20.100. It is expected to catch up with it, when all latest security contributions are ported to the trunk. Version No security Kerberos Delegation Tooken 0.20.1 0.920     0.20.100 1.047 (+14%) 44.670 42.615 0.22 1.597 (+73%) 45.148 43.455
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12439103/UGCRefresh.patch
        against trunk revision 923467.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/131/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/131/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/131/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/131/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12439103/UGCRefresh.patch against trunk revision 923467. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/131/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/131/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/131/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/131/console This message is automatically generated.
        Hide
        Konstantin Shvachko added a comment -

        I modified NNThroughputBenchmark. A new input argument -UGCacheRefreshCount specifies after how many operation the benchmark should reset the name-node's user group cache. When set to 1 it will refresh the cache for every operation. By default it never resets the cache. Using this option one can evaluate the overhead of UG resolution for any operation available in NNThroughputBenchmark.
        For the 0.20 branch comparison NNThroughputBenchmark can be run was is, since there was no server side UG resolution at the time.

        Show
        Konstantin Shvachko added a comment - I modified NNThroughputBenchmark. A new input argument -UGCacheRefreshCount specifies after how many operation the benchmark should reset the name-node's user group cache. When set to 1 it will refresh the cache for every operation. By default it never resets the cache. Using this option one can evaluate the overhead of UG resolution for any operation available in NNThroughputBenchmark. For the 0.20 branch comparison NNThroughputBenchmark can be run was is, since there was no server side UG resolution at the time.

          People

          • Assignee:
            Konstantin Shvachko
            Reporter:
            Konstantin Shvachko
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development