Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.0.0, 2.0.3-alpha
    • Fix Version/s: 2.0.3-alpha
    • Component/s: ipc
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Target Version/s:

      Description

      HADOOP-9015 inadvertently removed a doAs block around instantiation of the sasl server which renders a server incapable of accepting kerberized connections.

      1. HADOOP-9070.patch
        2 kB
        Daryn Sharp
      2. HADOOP-9070.patch
        3 kB
        Daryn Sharp
      3. HADOOP-9070.patch
        3 kB
        Daryn Sharp

        Issue Links

          Activity

          Hide
          Arun C Murthy added a comment -

          Closing again.

          Show
          Arun C Murthy added a comment - Closing again.
          Hide
          Suresh Srinivas added a comment -

          I understand the desire to avoid wire incompat, and I would 100% agree if this was 2.1 or 2.2. I'd make the case that alpha 2.0 is the time to make changes to support future work on the 2.x branch.

          +1. I have been making the same point in many many jiras, all of which are mainly blocked due to CDH4.

          If a simple compatible change is made as an alternate to this change, I am okay. Any thing that adds unnecessary complexity will be vetoed by me.

          Show
          Suresh Srinivas added a comment - I understand the desire to avoid wire incompat, and I would 100% agree if this was 2.1 or 2.2. I'd make the case that alpha 2.0 is the time to make changes to support future work on the 2.x branch. +1. I have been making the same point in many many jiras, all of which are mainly blocked due to CDH4. If a simple compatible change is made as an alternate to this change, I am okay. Any thing that adds unnecessary complexity will be vetoed by me.
          Hide
          Daryn Sharp added a comment -

          Yes, I believe HADOOP-8999 is where the change (or bulk of it) was made. My memory is fuzzy, but I think there was an existing case where a malformed protobuf exception was generated when the client wasn't reading a final response. The change is largely intended to support PLAIN and/or other future SASL mechanisms, but it's definitely a bug that the client and server cannot be sure that SASL has completed.

          Show
          Daryn Sharp added a comment - Yes, I believe HADOOP-8999 is where the change (or bulk of it) was made. My memory is fuzzy, but I think there was an existing case where a malformed protobuf exception was generated when the client wasn't reading a final response. The change is largely intended to support PLAIN and/or other future SASL mechanisms, but it's definitely a bug that the client and server cannot be sure that SASL has completed.
          Hide
          Todd Lipcon added a comment -

          Hi Daryn. I'm happy to do the work to make it compatible... I'm assuming the other JIRA you're mentioning is HADOOP-8999? One thing I'm wondering: are the SASL negotiation improvements necessary/applicable for the DIGEST and GSS SASL mechanisms in use now? Or are they only important for future extensions in other SASL mechanisms (as mentioned in the description of 8999?)

          Show
          Todd Lipcon added a comment - Hi Daryn. I'm happy to do the work to make it compatible... I'm assuming the other JIRA you're mentioning is HADOOP-8999 ? One thing I'm wondering: are the SASL negotiation improvements necessary/applicable for the DIGEST and GSS SASL mechanisms in use now? Or are they only important for future extensions in other SASL mechanisms (as mentioned in the description of 8999?)
          Hide
          Daryn Sharp added a comment -

          Reverting this patch alone won't undo the version incompatibility. The SASL exchange was amended on another jira to send a final ack during SASL exchange. This ensured a symmetry that every client message received a response - instead of the client sometimes assuming that auth was successful. If the assumption was wrong, and the server sent an exception or switch to simple, it was misinterpreted as a malformed protobuf response to the first proxy call.

          I might be able to somehow maintain compatibility, but it's likely going to require hardcoded hacks.

          I understand the desire to avoid wire incompat, and I would 100% agree if this was 2.1 or 2.2. I'd make the case that alpha 2.0 is the time to make changes to support future work on the 2.x branch. I'm concerned that the larger goal of pluggable SASL mechanisms won't work w/o more hacks for which mechanisms do or don't send a final ack, which essentially means it's not going to be feasible in 2.x.

          Show
          Daryn Sharp added a comment - Reverting this patch alone won't undo the version incompatibility. The SASL exchange was amended on another jira to send a final ack during SASL exchange. This ensured a symmetry that every client message received a response - instead of the client sometimes assuming that auth was successful. If the assumption was wrong, and the server sent an exception or switch to simple, it was misinterpreted as a malformed protobuf response to the first proxy call. I might be able to somehow maintain compatibility, but it's likely going to require hardcoded hacks. I understand the desire to avoid wire incompat, and I would 100% agree if this was 2.1 or 2.2. I'd make the case that alpha 2.0 is the time to make changes to support future work on the 2.x branch. I'm concerned that the larger goal of pluggable SASL mechanisms won't work w/o more hacks for which mechanisms do or don't send a final ack, which essentially means it's not going to be feasible in 2.x.
          Hide
          Todd Lipcon added a comment -

          I missed this when it went in, since the original description didn't mention that this would change the wire format. Per my comments elsewhere, I don't think we can afford to break wire compatibility in 2.0.3. I'd like to revert this from branch-2, but also don't want to regress the bug. Daryn, did you have an idea on how to do this compatibly?

          Show
          Todd Lipcon added a comment - I missed this when it went in, since the original description didn't mention that this would change the wire format. Per my comments elsewhere, I don't think we can afford to break wire compatibility in 2.0.3. I'd like to revert this from branch-2, but also don't want to regress the bug. Daryn, did you have an idea on how to do this compatibly?
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1277 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1277/)
          HADOOP-9070. Kerberos SASL server cannot find kerberos key. Contributed by Daryn Sharp. (Revision 1417729)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1417729
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1277 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1277/ ) HADOOP-9070 . Kerberos SASL server cannot find kerberos key. Contributed by Daryn Sharp. (Revision 1417729) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1417729 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1246 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1246/)
          HADOOP-9070. Kerberos SASL server cannot find kerberos key. Contributed by Daryn Sharp. (Revision 1417729)

          Result = FAILURE
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1417729
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1246 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1246/ ) HADOOP-9070 . Kerberos SASL server cannot find kerberos key. Contributed by Daryn Sharp. (Revision 1417729) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1417729 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3089 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3089/)
          HADOOP-9070. Kerberos SASL server cannot find kerberos key. Contributed by Daryn Sharp. (Revision 1417729)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1417729
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #3089 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3089/ ) HADOOP-9070 . Kerberos SASL server cannot find kerberos key. Contributed by Daryn Sharp. (Revision 1417729) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1417729 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
          Hide
          Aaron T. Myers added a comment -

          I've just committed this to trunk and branch-2 based on Bobby's +1 and my own testing.

          Show
          Aaron T. Myers added a comment - I've just committed this to trunk and branch-2 based on Bobby's +1 and my own testing.
          Hide
          Robert Joseph Evans added a comment -

          Seems good to me +1.

          Show
          Robert Joseph Evans added a comment - Seems good to me +1.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12554577/HADOOP-9070.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1796//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1796//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12554577/HADOOP-9070.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1796//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1796//console This message is automatically generated.
          Hide
          Daryn Sharp added a comment -

          bump rpc version

          Show
          Daryn Sharp added a comment - bump rpc version
          Hide
          Owen O'Malley added a comment -

          This looks good, but if the versions won't talk to each other we should bump the rpc version.

          Show
          Owen O'Malley added a comment - This looks good, but if the versions won't talk to each other we should bump the rpc version.
          Hide
          Daryn Sharp added a comment -

          Yes, I thoroughly tested both kerberos and tokens to ensure they work with the latest patch. As mentioned in my prior comment, this does cause a RPC incompatibility within 2.x. Earlier 2.x clients will receive an extra reply (SUCCESS) from a 2.0.3 server after the kerberos negotiation completes. The client will interpret this as the response for the next proxy call, which will cause a protobuf error. A 2.0.3 client will timeout waiting for the SUCCESS response from earlier 2.x servers. Maybe we should bump the RPC version in 2.0.3? Or if that's unpalatable, I can investigate a backwards compatible client change that might be hacky (not sure yet).

          Show
          Daryn Sharp added a comment - Yes, I thoroughly tested both kerberos and tokens to ensure they work with the latest patch. As mentioned in my prior comment, this does cause a RPC incompatibility within 2.x. Earlier 2.x clients will receive an extra reply (SUCCESS) from a 2.0.3 server after the kerberos negotiation completes. The client will interpret this as the response for the next proxy call, which will cause a protobuf error. A 2.0.3 client will timeout waiting for the SUCCESS response from earlier 2.x servers. Maybe we should bump the RPC version in 2.0.3? Or if that's unpalatable, I can investigate a backwards compatible client change that might be hacky (not sure yet).
          Hide
          Robert Joseph Evans added a comment -

          The new patch looks OK to me. Have you tested this on a secure cluster to be sure there isn't anything else lurking behind the scenes? Have you checked for backwards compatibility?

          Show
          Robert Joseph Evans added a comment - The new patch looks OK to me. Have you tested this on a secure cluster to be sure there isn't anything else lurking behind the scenes? Have you checked for backwards compatibility?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12554412/HADOOP-9070.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1782//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1782//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12554412/HADOOP-9070.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1782//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1782//console This message is automatically generated.
          Hide
          Daryn Sharp added a comment -

          Original patch was not tested correctly... There is also a problem with the client & server knowing in all cases that SASL negotiation is complete. Due to an earlier change in HADOOP-8999 to stop PLAIN clients from hanging waiting for a response from the server, kerberos clients may now hang. In HADOOP-9034 there is general agreement that the server should always send a success response, so I removed the conditional to only return success for PLAIN. I'd like to leave HADOOP-9034 open to implement a more robust solution. Note this patch causes RPC incompatibility, but I don't think this is a problem since 2.x is RPC incompatible with earlier releases anyway.

          Show
          Daryn Sharp added a comment - Original patch was not tested correctly... There is also a problem with the client & server knowing in all cases that SASL negotiation is complete. Due to an earlier change in HADOOP-8999 to stop PLAIN clients from hanging waiting for a response from the server, kerberos clients may now hang. In HADOOP-9034 there is general agreement that the server should always send a success response, so I removed the conditional to only return success for PLAIN. I'd like to leave HADOOP-9034 open to implement a more robust solution. Note this patch causes RPC incompatibility, but I don't think this is a problem since 2.x is RPC incompatible with earlier releases anyway.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12554369/HADOOP-9070.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1780//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1780//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12554369/HADOOP-9070.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1780//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1780//console This message is automatically generated.
          Hide
          Robert Joseph Evans added a comment -

          The change looks good to me, assuming Jenkins is OK with it. +1

          Show
          Robert Joseph Evans added a comment - The change looks good to me, assuming Jenkins is OK with it. +1
          Hide
          Daryn Sharp added a comment -

          Restore the doAs. I can't add tests, but this has been verified to work on a secure cluster that experienced the failure.

          Show
          Daryn Sharp added a comment - Restore the doAs . I can't add tests, but this has been verified to work on a secure cluster that experienced the failure.

            People

            • Assignee:
              Daryn Sharp
              Reporter:
              Daryn Sharp
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development