Hadoop Common
  1. Hadoop Common
  2. HADOOP-10590

ServiceAuthorizationManager is not threadsafe

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 2.5.0
    • Component/s: security
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The mutators in ServiceAuthorizationManager are synchronized. The accessors are not synchronized.
      This results in visibility issues when ServiceAuthorizationManager's state is accessed from different threads.

      1. performance-test-without-rpc.patch
        4 kB
        Benoy Antony
      2. performancetest.patch
        5 kB
        Benoy Antony
      3. HADOOP-10590.patch
        2 kB
        Benoy Antony

        Issue Links

          Activity

          Benoy Antony created issue -
          Hide
          Benoy Antony added a comment -

          Attaching a patch which makes the field in ServiceAuthorizationManager volatile instead of synchronizing mutators and accessors.
          Since the state is volatile, the effectively mutable state is safely published and hence correctly visible to all threads accessing the state.

          To understand more on the validity of this safe publication approach, please see https://issues.apache.org/jira/browse/HADOOP-10448?focusedCommentId=13980112

          Show
          Benoy Antony added a comment - Attaching a patch which makes the field in ServiceAuthorizationManager volatile instead of synchronizing mutators and accessors. Since the state is volatile , the effectively mutable state is safely published and hence correctly visible to all threads accessing the state. To understand more on the validity of this safe publication approach, please see https://issues.apache.org/jira/browse/HADOOP-10448?focusedCommentId=13980112
          Benoy Antony made changes -
          Field Original Value New Value
          Attachment HADOOP-10590.patch [ 12644083 ]
          Hide
          Benoy Antony added a comment -

          Since there is no change in functionality , no tests are modified or added.

          Show
          Benoy Antony added a comment - Since there is no change in functionality , no tests are modified or added.
          Benoy Antony made changes -
          Attachment HADOOP-10590.patch [ 12644084 ]
          Benoy Antony made changes -
          Attachment HADOOP-10590.patch [ 12644084 ]
          Benoy Antony made changes -
          Attachment HADOOP-10590.patch [ 12644085 ]
          Benoy Antony made changes -
          Attachment HADOOP-10590.patch [ 12644083 ]
          Benoy Antony made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Benoy Antony made changes -
          Link This issue relates to HADOOP-10593 [ HADOOP-10593 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12644085/HADOOP-10590.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3933//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3933//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12644085/HADOOP-10590.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3933//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3933//console This message is automatically generated.
          Hide
          Benoy Antony added a comment -

          Daryn Sharp, Could you please take a look at this ?

          Show
          Benoy Antony added a comment - Daryn Sharp , Could you please take a look at this ?
          Hide
          Daryn Sharp added a comment -

          I think it technically looks ok. However the now un-synchronized refresh methods are rarely if ever called via an admin command. Whereas protocolToAcl is accessed for every client connection. Volatiles are typically not "free", so have you benchmarked the change?

          Show
          Daryn Sharp added a comment - I think it technically looks ok. However the now un-synchronized refresh methods are rarely if ever called via an admin command. Whereas protocolToAcl is accessed for every client connection. Volatiles are typically not "free", so have you benchmarked the change?
          Hide
          Benoy Antony added a comment -

          We had a an error where authorization policy checks returned inconsistent results on a cluster.
          Looking at the code , that inconsistency is possible due to the visibility issues of the code.

          As you commented, there is a cost for volatile vs non-volatile. The performance of volatile depends heavily on the CPU architecture . http://stackoverflow.com/questions/4633866/is-volatile-expensive .

          I'll try benchmarking the change, but not sure if that's very useful since performance of volatile varies wildly across CPU architectures. I believe, the main decision will be correctness vs performance.

          Show
          Benoy Antony added a comment - We had a an error where authorization policy checks returned inconsistent results on a cluster. Looking at the code , that inconsistency is possible due to the visibility issues of the code. As you commented, there is a cost for volatile vs non-volatile. The performance of volatile depends heavily on the CPU architecture . http://stackoverflow.com/questions/4633866/is-volatile-expensive . I'll try benchmarking the change, but not sure if that's very useful since performance of volatile varies wildly across CPU architectures. I believe, the main decision will be correctness vs performance.
          Hide
          Daryn Sharp added a comment -

          Agreed on correct vs performance. Any update on performance though? My concern relates to large clusters seeing bursts of thousands of connections per second. I wouldn't expect much if any measurable impact, but stranger things have happened. The test would need to stress 1 rpc connections to avoid diluting the results.

          Show
          Daryn Sharp added a comment - Agreed on correct vs performance. Any update on performance though? My concern relates to large clusters seeing bursts of thousands of connections per second. I wouldn't expect much if any measurable impact, but stranger things have happened. The test would need to stress 1 rpc connections to avoid diluting the results.
          Benoy Antony made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Benoy Antony added a comment -

          Attaching the class using which I ran the performance tests
          Here an RPC server is created with 10 different protocol classes. The different protocols are required to cause simultaneous connection requests.
          The server sets acls on porticols such that authorization always fails. If any authorization is successful, then the successful connection will be cached and will not invoke authorize () again.

          The server is started with 4 reader threads and 4 handler threads.
          10 client threads are started. Each of the threads make 1000 calls simultaneously.
          The above is repeated 10 times and the total duration for each iteration is recorded and average is also calculated.
          The tests are executed on 2.6 GHz Intel Core i7 (8 core) , 16 GB 1600 MHz DDR3. Set log level to Error to remove any logging.
          The readings on for the following conditions are below:

          current code

          The different times in milliseconds are:
          7350 , 23234 , 30897 , 18634 , 32112 , 18905 , 35371 , 16758 , 32161 , 22113 .
          The average time in milliseconds is: 23754

          synchronized authorize

          The different times in milliseconds are:
          6101 , 25809 , 26170 , 22700 , 28617 , 24078 , 28440 , 23569 , 27246 , 25882 .
          The average time in milliseconds is: 23861

          volatile protocolToAcl

          The different times in milliseconds are:
          7256 , 25995 , 29741 , 19679 , 30089 , 22581 , 27695 , 22877 , 27267 , 23630 .
          The average time in milliseconds is: 23681

          Observations: There are no noticeable difference between unsynchronized/synchronized/volatile cases. Triggering the exception (AuthorizationException) , may be expensive and hence differences between unsynchronized/synchronized/volatile becomes negligible compared to the exception operations.

          Show
          Benoy Antony added a comment - Attaching the class using which I ran the performance tests Here an RPC server is created with 10 different protocol classes. The different protocols are required to cause simultaneous connection requests. The server sets acls on porticols such that authorization always fails. If any authorization is successful, then the successful connection will be cached and will not invoke authorize () again. The server is started with 4 reader threads and 4 handler threads. 10 client threads are started. Each of the threads make 1000 calls simultaneously. The above is repeated 10 times and the total duration for each iteration is recorded and average is also calculated. The tests are executed on 2.6 GHz Intel Core i7 (8 core) , 16 GB 1600 MHz DDR3. Set log level to Error to remove any logging. The readings on for the following conditions are below: current code The different times in milliseconds are: 7350 , 23234 , 30897 , 18634 , 32112 , 18905 , 35371 , 16758 , 32161 , 22113 . The average time in milliseconds is: 23754 synchronized authorize The different times in milliseconds are: 6101 , 25809 , 26170 , 22700 , 28617 , 24078 , 28440 , 23569 , 27246 , 25882 . The average time in milliseconds is: 23861 volatile protocolToAcl The different times in milliseconds are: 7256 , 25995 , 29741 , 19679 , 30089 , 22581 , 27695 , 22877 , 27267 , 23630 . The average time in milliseconds is: 23681 Observations: There are no noticeable difference between unsynchronized/synchronized/volatile cases. Triggering the exception (AuthorizationException) , may be expensive and hence differences between unsynchronized/synchronized/volatile becomes negligible compared to the exception operations.
          Benoy Antony made changes -
          Attachment performancetest.patch [ 12647326 ]
          Benoy Antony made changes -
          Attachment performancetest.patch [ 12647326 ]
          Benoy Antony made changes -
          Attachment performancetest.patch [ 12647327 ]
          Hide
          Benoy Antony added a comment -

          Daryn Sharp, Did you get a chance to review the performance findings ?

          Show
          Benoy Antony added a comment - Daryn Sharp , Did you get a chance to review the performance findings ?
          Benoy Antony made changes -
          Link This issue is related to HADOOP-10648 [ HADOOP-10648 ]
          Hide
          Benoy Antony added a comment -

          Running performance test without RPC call.
          The performance test class is attached

          10 client threads are started. Each of the threads make 1000 authorize simultaneously. 
The above is repeated 10 times and the total duration for each iteration is recorded and average is also calculated.
The tests are executed on 2.6 GHz Intel Core i7 (8 core) , 16 GB 1600 MHz DDR3. Set log level to Error to remove any logging.
The readings on for the following conditions are below:
          current code

          The different times in milliseconds are: 448 , 202 , 165 , 225 , 160 , 171 , 233 , 158 , 156 , 155.
          The average time in milliseconds is: 207

          synchronized authorize

          The different times in milliseconds are: 461 , 227 , 190 , 256 , 168 , 210 , 266 , 169 , 166 , 176.
          The average time in milliseconds is: 229

          volatile protocolToAcl

          The different times in milliseconds are: 476 , 205 , 169 , 228 , 159 , 167 , 237 , 161 , 158 , 156.
          The average time in milliseconds is: 212

          Show
          Benoy Antony added a comment - Running performance test without RPC call . The performance test class is attached 10 client threads are started. Each of the threads make 1000 authorize simultaneously. 
The above is repeated 10 times and the total duration for each iteration is recorded and average is also calculated.
The tests are executed on 2.6 GHz Intel Core i7 (8 core) , 16 GB 1600 MHz DDR3. Set log level to Error to remove any logging.
The readings on for the following conditions are below: current code The different times in milliseconds are: 448 , 202 , 165 , 225 , 160 , 171 , 233 , 158 , 156 , 155. The average time in milliseconds is: 207 synchronized authorize The different times in milliseconds are: 461 , 227 , 190 , 256 , 168 , 210 , 266 , 169 , 166 , 176. The average time in milliseconds is: 229 volatile protocolToAcl The different times in milliseconds are: 476 , 205 , 169 , 228 , 159 , 167 , 237 , 161 , 158 , 156. The average time in milliseconds is: 212
          Benoy Antony made changes -
          Attachment performance-test-without-rpc.patch [ 12649444 ]
          Hide
          Benoy Antony added a comment -

          Daryn Sharp Will you be able to review the performance numbers ? Please let me know if anything else needs to be done.

          Show
          Benoy Antony added a comment - Daryn Sharp Will you be able to review the performance numbers ? Please let me know if anything else needs to be done.
          Hide
          Daryn Sharp added a comment -

          +1 I suppose a 2% degradation in a microbench for a method called once per connection is worth the correctness.

          Show
          Daryn Sharp added a comment - +1 I suppose a 2% degradation in a microbench for a method called once per connection is worth the correctness.
          Hide
          Benoy Antony added a comment -

          Thanks Daryn Sharp.
          Vinayakumar B , Could you please review and commit as well ?

          Show
          Benoy Antony added a comment - Thanks Daryn Sharp . Vinayakumar B , Could you please review and commit as well ?
          Hide
          Vinayakumar B added a comment -

          Thanks Benoy Antony for the finding and the patch.
          +1 from my side as well. Thanks Daryn Sharp for the review.
          I will commit the patch soon.

          Show
          Vinayakumar B added a comment - Thanks Benoy Antony for the finding and the patch. +1 from my side as well. Thanks Daryn Sharp for the review. I will commit the patch soon.
          Hide
          Vinayakumar B added a comment -

          Committed to trunk and branch-2.

          Show
          Vinayakumar B added a comment - Committed to trunk and branch-2.
          Vinayakumar B made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Fix Version/s 2.5.0 [ 12326263 ]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #5721 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5721/)
          HADOOP-10590. ServiceAuthorizationManager is not threadsafe. (Contributed by Benoy Antony) (vinayakumarb: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1603356)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/authorize/ServiceAuthorizationManager.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #5721 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5721/ ) HADOOP-10590 . ServiceAuthorizationManager is not threadsafe. (Contributed by Benoy Antony) (vinayakumarb: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1603356 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/authorize/ServiceAuthorizationManager.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #587 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/587/)
          HADOOP-10590. ServiceAuthorizationManager is not threadsafe. (Contributed by Benoy Antony) (vinayakumarb: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1603356)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/authorize/ServiceAuthorizationManager.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #587 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/587/ ) HADOOP-10590 . ServiceAuthorizationManager is not threadsafe. (Contributed by Benoy Antony) (vinayakumarb: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1603356 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/authorize/ServiceAuthorizationManager.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #1778 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1778/)
          HADOOP-10590. ServiceAuthorizationManager is not threadsafe. (Contributed by Benoy Antony) (vinayakumarb: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1603356)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/authorize/ServiceAuthorizationManager.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1778 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1778/ ) HADOOP-10590 . ServiceAuthorizationManager is not threadsafe. (Contributed by Benoy Antony) (vinayakumarb: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1603356 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/authorize/ServiceAuthorizationManager.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1805 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1805/)
          HADOOP-10590. ServiceAuthorizationManager is not threadsafe. (Contributed by Benoy Antony) (vinayakumarb: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1603356)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/authorize/ServiceAuthorizationManager.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1805 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1805/ ) HADOOP-10590 . ServiceAuthorizationManager is not threadsafe. (Contributed by Benoy Antony) (vinayakumarb: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1603356 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/authorize/ServiceAuthorizationManager.java
          Karthik Kambatla (Inactive) made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Benoy Antony
              Reporter:
              Benoy Antony
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development