Hadoop Common
  1. Hadoop Common
  2. HADOOP-10475

ConcurrentModificationException in AbstractDelegationTokenSelector.selectToken()

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 2.5.0
    • Component/s: security
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      While running a hive job on a HA cluster saw ConcurrentModificationException in AbstractDelegationTokenSelector.selectToken()

      1. HADOOP-10475.002.patch
        3 kB
        Jing Zhao
      2. HADOOP-10475.001.patch
        1 kB
        Jing Zhao
      3. HADOOP-10475.000.patch
        1 kB
        Jing Zhao

        Issue Links

          Activity

          Hide
          Arpit Gupta added a comment -
          Caused by: java.util.ConcurrentModificationException
          	at java.util.HashMap$HashIterator.nextEntry(HashMap.java:894)
          	at java.util.HashMap$ValueIterator.next(HashMap.java:922)
          	at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1067)
          	at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSelector.selectToken(AbstractDelegationTokenSelector.java:53)
          	at org.apache.hadoop.hdfs.HAUtil.cloneDelegationTokenForLogicalUri(HAUtil.java:260)
          	at org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.<init>
          
          Show
          Arpit Gupta added a comment - Caused by: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:894) at java.util.HashMap$ValueIterator.next(HashMap.java:922) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1067) at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSelector.selectToken(AbstractDelegationTokenSelector.java:53) at org.apache.hadoop.hdfs.HAUtil.cloneDelegationTokenForLogicalUri(HAUtil.java:260) at org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.<init>
          Hide
          Jing Zhao added a comment -

          Looks like the issue is that AbstractDelegationTokenSelector#selectToken tries to iterate the tokensMap, while another thread tries to add a new token to the UGI. The monitor of UGI cannot prevent this race, since the UserGroupInformation#getTokens returns the whole token map.

          Show
          Jing Zhao added a comment - Looks like the issue is that AbstractDelegationTokenSelector#selectToken tries to iterate the tokensMap, while another thread tries to add a new token to the UGI. The monitor of UGI cannot prevent this race, since the UserGroupInformation#getTokens returns the whole token map.
          Hide
          Jing Zhao added a comment -

          Currently we directly get the whole token map through UserGroupInformation#getTokens, and pass it around for works like token selection. Instead of doing this, we may want to provide a different API in UGI class which takes token selector as parameter and process the token selection.

          Another quick and simple fix is to use a ConcurrentMap for Credentials#tokenMap.

          Show
          Jing Zhao added a comment - Currently we directly get the whole token map through UserGroupInformation#getTokens, and pass it around for works like token selection. Instead of doing this, we may want to provide a different API in UGI class which takes token selector as parameter and process the token selection. Another quick and simple fix is to use a ConcurrentMap for Credentials#tokenMap.
          Hide
          Jing Zhao added a comment -

          The patch with the simple fix using ConcurrentHashMap. This may bring us some performance penalty.

          Show
          Jing Zhao added a comment - The patch with the simple fix using ConcurrentHashMap. This may bring us some performance penalty.
          Hide
          Haohui Mai added a comment -

          Let's put a comment there to show why ConcurrentHashMap is used. +1 after address it.

          Show
          Haohui Mai added a comment - Let's put a comment there to show why ConcurrentHashMap is used. +1 after address it.
          Hide
          Jing Zhao added a comment -

          Thanks for the quick review, Haohui Mai! Update the patch to address your comments.

          Show
          Jing Zhao added a comment - Thanks for the quick review, Haohui Mai ! Update the patch to address your comments.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12639306/HADOOP-10475.001.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:

          org.apache.hadoop.security.TestUserGroupInformation
          org.apache.hadoop.fs.TestFileSystemCaching

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3765//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3765//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12639306/HADOOP-10475.001.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.security.TestUserGroupInformation org.apache.hadoop.fs.TestFileSystemCaching +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3765//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3765//console This message is automatically generated.
          Hide
          Jing Zhao added a comment -

          Tests failed because ConcurrentHashMap does not support null key. I'm not sure why we want to allow null key for token map. I will post a patch which simply reverts the corresponding change from HADOOP-8726.

          Show
          Jing Zhao added a comment - Tests failed because ConcurrentHashMap does not support null key. I'm not sure why we want to allow null key for token map. I will post a patch which simply reverts the corresponding change from HADOOP-8726 .
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12639332/HADOOP-10475.002.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3768//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3768//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12639332/HADOOP-10475.002.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3768//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3768//console This message is automatically generated.
          Hide
          Haohui Mai added a comment -

          Reverting HADOOP-8726 looks good to me. +1

          Show
          Haohui Mai added a comment - Reverting HADOOP-8726 looks good to me. +1
          Hide
          Jing Zhao added a comment -

          I've committed this to trunk and branch-2. Thanks to Haohui for review!

          Show
          Jing Zhao added a comment - I've committed this to trunk and branch-2. Thanks to Haohui for review!
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #5473 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5473/)
          HADOOP-10475. ConcurrentModificationException in AbstractDelegationTokenSelector.selectToken(). Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1585888)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Credentials.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #5473 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5473/ ) HADOOP-10475 . ConcurrentModificationException in AbstractDelegationTokenSelector.selectToken(). Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1585888 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Credentials.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #534 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/534/)
          HADOOP-10475. ConcurrentModificationException in AbstractDelegationTokenSelector.selectToken(). Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1585888)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Credentials.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #534 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/534/ ) HADOOP-10475 . ConcurrentModificationException in AbstractDelegationTokenSelector.selectToken(). Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1585888 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Credentials.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1752 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1752/)
          HADOOP-10475. ConcurrentModificationException in AbstractDelegationTokenSelector.selectToken(). Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1585888)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Credentials.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1752 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1752/ ) HADOOP-10475 . ConcurrentModificationException in AbstractDelegationTokenSelector.selectToken(). Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1585888 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Credentials.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #1726 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1726/)
          HADOOP-10475. ConcurrentModificationException in AbstractDelegationTokenSelector.selectToken(). Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1585888)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Credentials.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1726 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1726/ ) HADOOP-10475 . ConcurrentModificationException in AbstractDelegationTokenSelector.selectToken(). Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1585888 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Credentials.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java
          Hide
          Daryn Sharp added a comment -

          While the patch in this jira looks ok, I haven't had to chance to see if my HADOOP-8726 was truly reverted.

          At any rate, please don't file and resolve security related tickets within ~6h - let alone contemplate reverting jiras w/o input from the assignee. I have had to fix many delicate issues over the years and can state from personal experience that small changes can cause subtle bugs.

          Show
          Daryn Sharp added a comment - While the patch in this jira looks ok, I haven't had to chance to see if my HADOOP-8726 was truly reverted. At any rate, please don't file and resolve security related tickets within ~6h - let alone contemplate reverting jiras w/o input from the assignee. I have had to fix many delicate issues over the years and can state from personal experience that small changes can cause subtle bugs.
          Hide
          Jing Zhao added a comment -

          that small changes can cause subtle bugs

          Yes, totally agree. Actually this bug was introduced by a small change in HADOOP-8726 (see the change between https://issues.apache.org/jira/secure/attachment/12542820/HADOOP-8726.patch and https://issues.apache.org/jira/secure/attachment/12542928/HADOOP-8726.patch). We should be more careful when checking in small changes in the future.

          Show
          Jing Zhao added a comment - that small changes can cause subtle bugs Yes, totally agree. Actually this bug was introduced by a small change in HADOOP-8726 (see the change between https://issues.apache.org/jira/secure/attachment/12542820/HADOOP-8726.patch and https://issues.apache.org/jira/secure/attachment/12542928/HADOOP-8726.patch ). We should be more careful when checking in small changes in the future.
          Hide
          Daryn Sharp added a comment -

          I don't think this patch fixes the problem. The ArrayList constructor is also iterating over the values in the token map to make a copy, so it too can encounter the CME. The race is now smaller but the problem remains.

          Show
          Daryn Sharp added a comment - I don't think this patch fixes the problem. The ArrayList constructor is also iterating over the values in the token map to make a copy, so it too can encounter the CME. The race is now smaller but the problem remains.
          Hide
          Jing Zhao added a comment -

          From the UGI point of view, getTokens and addToken are both protected by the UGI's monitor, thus can protect the race in UGI?

          Show
          Jing Zhao added a comment - From the UGI point of view, getTokens and addToken are both protected by the UGI's monitor, thus can protect the race in UGI?
          Hide
          Daryn Sharp added a comment -

          No, UGI synchronization is not trustworthy. Calling UGI.getCurrentUser() twice will return you two distinct objects which means synchronization is often worthless. I've contemplated multiple ways to fix this but haven't had time to devote to the task. In the scope of this jira, you may want to synch on the Subject which is indeed shared between UGI instances.

          Show
          Daryn Sharp added a comment - No, UGI synchronization is not trustworthy. Calling UGI.getCurrentUser() twice will return you two distinct objects which means synchronization is often worthless. I've contemplated multiple ways to fix this but haven't had time to devote to the task. In the scope of this jira, you may want to synch on the Subject which is indeed shared between UGI instances.
          Hide
          Jing Zhao added a comment -

          Calling UGI.getCurrentUser() twice will return you two distinct objects which means synchronization is often worthless

          Hmm, you're right. I will create a separate jira to track this.

          Show
          Jing Zhao added a comment - Calling UGI.getCurrentUser() twice will return you two distinct objects which means synchronization is often worthless Hmm, you're right. I will create a separate jira to track this.
          Hide
          Jing Zhao added a comment -

          I've created HADOOP-10489. Thanks for pointing out the issue, Daryn Sharp! Please feel free to assign that jira to yourself.

          Show
          Jing Zhao added a comment - I've created HADOOP-10489 . Thanks for pointing out the issue, Daryn Sharp ! Please feel free to assign that jira to yourself.

            People

            • Assignee:
              Jing Zhao
              Reporter:
              Arpit Gupta
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development