Hadoop Common
  1. Hadoop Common
  2. HADOOP-8828

Support distcp from secure to insecure clusters

    Details

    • Type: Bug Bug
    • Status: Reopened
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Users currently can't distcp from secure to insecure clusters.

      Relevant background from ATM:

      There's no plumbing to make the HFTP client use AuthenticatedURL in the case security is enabled. This means that even though you have the servlet filter correctly configured on the server, the client doesn't know how to properly authenticate to that filter.

      The crux of the issue is that security is enabled globally instead of per-file system. The trick of using HFTP as the source FS works when the source is insecure, but not the source is secure.

      Normal cp with two hdfs:// URL can be made to work. There is indeed logic in o.a.h.ipc.Client to fall back to using simple authentication if your client config has security enabled (hadoop.security.authentication set to "kerberos") and the server responds with a response for simple authentication. Thing is, there are at least 3 bugs with this that I bumped into. All three can be worked around.

      1) If your client config has security enabled you must have a valid Kerberos TGT, even if you're interacting with an insecure cluster. The hadoop client unfortunately tries to read the local ticket cache before it tries to connect to the server, and so doesn't know that it won't need Kerberos credentials.

      2) Even though the destination NN is insecure, it has to have a Kerberos principal created for it. You don't need a keytab, and you don't need to change any settings on the destination NN. The principal just needs to exist in the principal database. This is again because the hadoop client will, before connecting to the remote NN, try to get a service ticket for the hdfs/f.q.d.n principal for the remote NN. If this fails, it won't even get to the part where it tries to connect to the insecure NN and falls back to simple auth.

      3) Once you get through problems 1 and 2, you will try to connect to the remote, insecure NN. This will work, but the reported principal name of your user will include a realm that the remote NN doesn't know about. You will either need to change the default_realm setting in /etc/krb5.conf on the insecure NN to be the same as the secure NN, or you will need to add some custom hadoop.security.auth_to_local mappings on the insecure NN so it knows how to translate this long principal name into a short name.

      Even with all these changes, distcp still won't work since the first thing it tries to do when submitting the job is to get a delegation token for all the involved NNs, which won't work since the insecure NN isn't running a DT secret manager. I haven't been able to figure out a way around this, except to make a custom distcp which doesn't necessarily do this.

        Issue Links

          Activity

          Eli Collins created issue -
          Eli Collins made changes -
          Field Original Value New Value
          Project Hadoop HDFS [ 12310942 ] Hadoop Common [ 12310240 ]
          Key HDFS-3712 HADOOP-8828
          Daryn Sharp made changes -
          Link This issue is related to HDFS-3905 [ HDFS-3905 ]
          Hide
          Daryn Sharp added a comment -

          Which versions are you observing this behavior?

          Show
          Daryn Sharp added a comment - Which versions are you observing this behavior?
          Eli Collins made changes -
          Comment [ Thanks Eli.

          Do you know why this is prioritized as "major"? I can conceive of cases
          (like test cases) where this would be nice to have, but asking customers
          about security setups, I can't think of one who said they can't have both
          clusters secure.

          Hope you've been well, and btw, the 'nox just opened up a new wing...not
          sure if you caught that...

          Basier Aziz | Director, Product Management | Cloudera



          ]
          Hide
          Eli Collins added a comment -

          I believe this was 2.0 or a trunk build around that time.

          Show
          Eli Collins added a comment - I believe this was 2.0 or a trunk build around that time.
          Hide
          Eli Collins added a comment -

          Btw here's the current failure mode for secure to insecure using hftp and webhdfs (thanks to Stephen Chu).

          Caused by: java.io.IOException: Couldn't setup connection for schu@HAL.CLOUDERA.COM to hdfs/c1204.hal.cloudera.com@HAL.CLOUDERA.COM
          at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:540)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:396)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
          at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:512)
          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:596)
          at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:220)
          at org.apache.hadoop.ipc.Client.getConnection(Client.java:1213)
          at org.apache.hadoop.ipc.Client.call(Client.java:1140)
          ... 22 more
          Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database\
          (7) - UNKNOWN_SERVER)]
          at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:194)
          at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:137)
          at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:423)
          at org.apache.hadoop.ipc.Client$Connection.access$1300(Client.java:220)
          at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:589)
          at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:586)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:396)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:585)
          ... 25 more
          Caused by: GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - UNKNOWN_SERVER)
          at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:663)
          at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:230)
          at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162)
          at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175)
          ... 34 more
          Caused by: KrbException: Server not found in Kerberos database (7) - UNKNOWN_SERVER
          at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:64)
          at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:185)
          at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:294)
          at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:106)
          at sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:557)
          at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:594)
          ... 37 more
          Caused by: KrbException: Identifier doesn't match expected value (906)
          at sun.security.krb5.internal.KDCRep.init(KDCRep.java:133)
          at sun.security.krb5.internal.TGSRep.init(TGSRep.java:58)
          at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:53)
          at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:46)
          ... 42 more

          Show
          Eli Collins added a comment - Btw here's the current failure mode for secure to insecure using hftp and webhdfs (thanks to Stephen Chu). Caused by: java.io.IOException: Couldn't setup connection for schu@HAL.CLOUDERA.COM to hdfs/c1204.hal.cloudera.com@HAL.CLOUDERA.COM at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:540) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:512) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:596) at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:220) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1213) at org.apache.hadoop.ipc.Client.call(Client.java:1140) ... 22 more Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database\ (7) - UNKNOWN_SERVER)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:194) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:137) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:423) at org.apache.hadoop.ipc.Client$Connection.access$1300(Client.java:220) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:589) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:586) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:585) ... 25 more Caused by: GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - UNKNOWN_SERVER) at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:663) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:230) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162) at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175) ... 34 more Caused by: KrbException: Server not found in Kerberos database (7) - UNKNOWN_SERVER at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:64) at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:185) at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:294) at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:106) at sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:557) at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:594) ... 37 more Caused by: KrbException: Identifier doesn't match expected value (906) at sun.security.krb5.internal.KDCRep.init(KDCRep.java:133) at sun.security.krb5.internal.TGSRep.init(TGSRep.java:58) at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:53) at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:46) ... 42 more
          Hide
          Daryn Sharp added a comment -

          Owen and I were just discussing a few weeks ago if we should always attempt a kerberos connection if the user has kerberos credentials. It's odd that we assume that all clusters are insecure if the local cluster is insecure. It should be a simple and compatible change because an insecure NN's sasl will kick the user back to simple auth if presented with kerberos.

          Show
          Daryn Sharp added a comment - Owen and I were just discussing a few weeks ago if we should always attempt a kerberos connection if the user has kerberos credentials. It's odd that we assume that all clusters are insecure if the local cluster is insecure. It should be a simple and compatible change because an insecure NN's sasl will kick the user back to simple auth if presented with kerberos.
          Hide
          Owen O'Malley added a comment -

          To copy data from a secure cluster to insecure cluster, run the distcp on the secure cluster. It would be a security problem to run the distcp on the insecure cluster because you would need tokens for the secure cluster and they wouldn't be protected by the insecure cluster.

          Of course that means you'll need to use webhdfs or hdfs rpc to perform the write, but that is far better than encouraging users to open up security holes in their secure clusters.

          Show
          Owen O'Malley added a comment - To copy data from a secure cluster to insecure cluster, run the distcp on the secure cluster. It would be a security problem to run the distcp on the insecure cluster because you would need tokens for the secure cluster and they wouldn't be protected by the insecure cluster. Of course that means you'll need to use webhdfs or hdfs rpc to perform the write, but that is far better than encouraging users to open up security holes in their secure clusters.
          Owen O'Malley made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Not A Problem [ 8 ]
          Hide
          Daryn Sharp added a comment -

          FYI, this problem should be addressed by SASL changes in RPCv9. The client won't attempt kerberos if the server doesn't support it.

          Show
          Daryn Sharp added a comment - FYI, this problem should be addressed by SASL changes in RPCv9. The client won't attempt kerberos if the server doesn't support it.
          Haohui Mai made changes -
          Link This issue relates to HADOOP-10016 [ HADOOP-10016 ]
          Hide
          Haohui Mai added a comment -

          I'm reopening this JIRA as we found some subtle problems in HADOOP-10016 and HADOOP-10017.

          I've tested the set up of copying from a secure Hadoop cluster to secure Hadoop 2 cluster using distcp, and have found some subtle problems:

          1. There is a NPE in the trunk which fails the distcp job (addressed in HADOOP-10017).
          2. When copying from a secure Hadoop 1 cluster to an insecure Hadoop 2 cluster, the Hadoop 1 cluster cannot handle this case because it lacks of negotiation and fallback mechanisms during authentications. (addressed in HADOOP-10016).

          Show
          Haohui Mai added a comment - I'm reopening this JIRA as we found some subtle problems in HADOOP-10016 and HADOOP-10017 . I've tested the set up of copying from a secure Hadoop cluster to secure Hadoop 2 cluster using distcp, and have found some subtle problems: 1. There is a NPE in the trunk which fails the distcp job (addressed in HADOOP-10017 ). 2. When copying from a secure Hadoop 1 cluster to an insecure Hadoop 2 cluster, the Hadoop 1 cluster cannot handle this case because it lacks of negotiation and fallback mechanisms during authentications. (addressed in HADOOP-10016 ).
          Haohui Mai made changes -
          Resolution Not A Problem [ 8 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Assignee Haohui Mai [ wheat9 ]
          Haohui Mai made changes -
          Link This issue incorporates HADOOP-10017 [ HADOOP-10017 ]
          Haohui Mai made changes -
          Link This issue incorporates HADOOP-10016 [ HADOOP-10016 ]
          Haohui Mai made changes -
          Link This issue relates to HADOOP-10016 [ HADOOP-10016 ]

            People

            • Assignee:
              Haohui Mai
              Reporter:
              Eli Collins
            • Votes:
              4 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

              • Created:
                Updated:

                Development