Uploaded image for project: 'Slider'
  1. Slider
  2. SLIDER-1027

add a kdiag command for kerberos diagnostics

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Slider 0.90.2
    • Fix Version/s: Slider 0.90.2
    • Component/s: client
    • Labels:
      None
    • Environment:

      Kerberos

    • Sprint:
      Slider September #1

      Description

      Trying to debug kerberos problems is painful exercise

      Add a kerberos diagnostics command, `kdiag` to aid this; output to stdout or to a named file.

      add other args as appropriate, etg

      • `--required` - fail with exit code if no login
      • `--zookeeper` check zk details
      • `--hdfs` check HDFS
      • `--yarn` check yarn login
      • --keytab + keytab name`: check for accessibility, contents
      1. out.txt
        14 kB
        Steve Loughran

        Issue Links

          Activity

          Hide
          stevel@apache.org Steve Loughran added a comment -

          this could go on forever. Add the basics for now..

          Also, UGI could interfere with the process. The assumption here is that unless a keytab is set, the user credentials will be picked up

          Show
          stevel@apache.org Steve Loughran added a comment - this could go on forever. Add the basics for now.. Also, UGI could interfere with the process. The assumption here is that unless a keytab is set, the user credentials will be picked up
          Hide
          stevel@apache.org Steve Loughran added a comment -

          core is implemented.
          Features

          1. lives in the hadoop security package, needed to reset the renewal time —long term goal would be HADOOP-12426, moving it into hadoop core
          2. cranks up some of the JRE diagnostics (and doesn't crank them down afterwards); these go to stderr
          3. doesn't actually attempt to connect to any services (RM, RM proxy, HDFS, ZK ...). That couldn't go into a hadoop-core feature
          4. dumps out the various env vars, sysprops and hadoop options related to security.

          To add

          1. doesnt have any tests: needs a secure cluster for this. I should add an integration test.
          2. the --fail option is meant to trigger a '41' exit code on auth failures, but it's not complete
          3. no network diagnostics (DNS to KDC). This would need parsing of the krb conf file, or just add a --kdc host probe
          Show
          stevel@apache.org Steve Loughran added a comment - core is implemented. Features lives in the hadoop security package, needed to reset the renewal time —long term goal would be HADOOP-12426 , moving it into hadoop core cranks up some of the JRE diagnostics (and doesn't crank them down afterwards); these go to stderr doesn't actually attempt to connect to any services (RM, RM proxy, HDFS, ZK ...). That couldn't go into a hadoop-core feature dumps out the various env vars, sysprops and hadoop options related to security. To add doesnt have any tests: needs a secure cluster for this. I should add an integration test. the --fail option is meant to trigger a '41' exit code on auth failures, but it's not complete no network diagnostics (DNS to KDC). This would need parsing of the krb conf file, or just add a --kdc host probe
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Attached, the output of a diagnostics run, with hadoop.security logging at DEBUG, and HADOOP_JAAS_DEBUG=true; not things that kdiag does

          slider kdiag --keytab $CLUSTER_DIR/keytabs/zk.service.keytab --principal zookeeper/devix.cotham.uk > target/out.txt 2>&1
          
          Show
          stevel@apache.org Steve Loughran added a comment - Attached, the output of a diagnostics run, with hadoop.security logging at DEBUG, and HADOOP_JAAS_DEBUG=true; not things that kdiag does slider kdiag --keytab $CLUSTER_DIR/keytabs/zk.service.keytab --principal zookeeper/devix.cotham.uk > target/out.txt 2>&1
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1720369 from Steve Loughran in branch 'site/trunk'
          [ https://svn.apache.org/r1720369 ]

          SLIDER-1027 add a kdiag command for kerberos diagnostics

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1720369 from Steve Loughran in branch 'site/trunk' [ https://svn.apache.org/r1720369 ] SLIDER-1027 add a kdiag command for kerberos diagnostics
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1720370 from Steve Loughran in branch 'site/trunk'
          [ https://svn.apache.org/r1720370 ]

          SLIDER-1027 add a kdiag command for kerberos diagnostics

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1720370 from Steve Loughran in branch 'site/trunk' [ https://svn.apache.org/r1720370 ] SLIDER-1027 add a kdiag command for kerberos diagnostics
          Hide
          stevel@apache.org Steve Loughran added a comment -

          And the outcome of KDiagCommandIT against a secure cluster where the user has no TGT:

          2015-12-16 16:08:27,224 [main] DEBUG security.UserGroupInformation (loginUserFromSubject(825)) - UGI loginUser:stevel (auth:KERBEROS)
          2015-12-16 16:08:27,283 [main] ERROR client.SliderClient (actionKDiag(3801)) - org.apache.hadoop.security.KerberosDiags$KerberosDiagsFailure: Login user: No kerberos credentials for  stevel (auth:KERBEROS)
          2015-12-16 16:08:27,284 [main] DEBUG client.SliderClient (actionKDiag(3802)) - org.apache.hadoop.security.KerberosDiags$KerberosDiagsFailure: Login user: No kerberos credentials for  stevel (auth:KERBEROS)
          org.apache.hadoop.security.KerberosDiags$KerberosDiagsFailure: Login user: No kerberos credentials for  stevel (auth:KERBEROS)
          	at org.apache.hadoop.security.KerberosDiags.fail(KerberosDiags.java:297)
          	at org.apache.hadoop.security.KerberosDiags.failif(KerberosDiags.java:303)
          	at org.apache.hadoop.security.KerberosDiags.validateUser(KerberosDiags.java:289)
          	at org.apache.hadoop.security.KerberosDiags.execute(KerberosDiags.java:195)
          	at org.apache.slider.client.SliderClient.actionKDiag(SliderClient.java:3799)
          	at org.apache.slider.client.SliderClient.exec(SliderClient.java:397)
          	at org.apache.slider.client.SliderClient.runService(SliderClient.java:326)
          	at org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:188)
          	at org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:475)
          	at org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:403)
          	at org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.java:630)
          	at org.apache.slider.Slider.main(Slider.java:49)
          </stdout>
          2015-12-16 16:08:28,335 [JUnit] ERROR framework.ShellBase (NativeMethodAccessorImpl.java:invoke0(?)) - 
          <stderr>
          
          
          Kerberos Diagnostics scan at Wed Dec 16 16:08:26 GMT 2015
          
          
          
          System Properties
          
          java.security.krb5.conf = "(unset)"
          java.security.krb5.realm = "(unset)"
          sun.security.krb5.debug = "(unset)"
          sun.security.spnego.debug = "(unset)"
          
          
          Environment Variables
          
          HADOOP_JAAS_DEBUG = "true"
          KRB5CCNAME = "(unset)"
          HADOOP_USER_NAME = "(unset)"
          HADOOP_PROXY_USER = "(unset)"
          HADOOP_TOKEN_FILE_LOCATION = "(unset)"
          hadoop.kerberos.kinit.command = "kinit"
          hadoop.security.authentication = "kerberos"
          hadoop.security.authorization = "true"
          hadoop.security.dns.interface = "(unset)"
          hadoop.security.dns.nameserver = "(unset)"
          hadoop.ssl.enabled = "false"
          hadoop.rpc.protection = "authentication"
          hadoop.security.saslproperties.resolver.class = "(unset)"
          hadoop.security.crypto.codec.classes = "(unset)"
          hadoop.security.group.mapping = "org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback"
          
          
          Logging in
          
          
          
          Log in user
          
          UGI=stevel (auth:KERBEROS)
          Has kerberos credentials: false
          Authentication method: KERBEROS
          Real Authentication method: KERBEROS
          
          
          Group names
          
          staff
          access_bpf
          everyone
          localaccounts
          _appserverusr
          admin
          _appserveradm
          _lpadmin
          com.apple.access_ssh
          _appstore
          _lpoperator
          _developer
          com.apple.access_ftp
          com.apple.access_screensharing
          
          
          Credentials
          
          
          
          Secret keys
          
          (none)
          
          
          Tokens
          
          (none)
          Ticket based login: false
          Keytab based login: false
          </stderr>
          Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 5.883 sec <<< FAILURE! - in org.apache.slider.funtest.commands.KDiagCommandIT
          testKdiag(org.apache.slider.funtest.commands.KDiagCommandIT)  Time elapsed: 3.313 sec  <<< ERROR!
          org.apache.slider.core.exceptions.SliderException: Expected exit code of command /Users/stevel/Projects/Hortonworks/Projects/slider/slider-assembly/target/slider-0.90.0-incubating-SNAPSHOT-all/slider-0.90.0-incubating-SNAPSHOT/bin/slider kdiag --fail : 0 - actual=41 
          	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
          	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
          	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
          	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
          	at org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:80)
          	at org.codehaus.groovy.reflection.CachedConstructor.doConstructorInvoke(CachedConstructor.java:74)
          	at org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrap.callConstructor(ConstructorSite.java:84)
          	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:60)
          	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:235)
          	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:255)
          	at org.apache.slider.funtest.framework.ShellBase.assertExitCode(ShellBase.groovy:195)
          	at org.apache.slider.funtest.framework.ShellBase.assertExitCode(ShellBase.groovy)
          	at org.apache.slider.funtest.framework.CommandTestBase.assertExitCode(CommandTestBase.groovy:502)
          	at org.apache.slider.funtest.framework.CommandTestBase.assertSuccess(CommandTestBase.groovy:485)
          	at org.apache.slider.funtest.commands.KDiagCommandIT.testKdiag(KDiagCommandIT.groovy:36)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          	at java.lang.reflect.Method.invoke(Method.java:606)
          	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
          	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
          	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
          	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
          	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
          	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
          
          Show
          stevel@apache.org Steve Loughran added a comment - And the outcome of KDiagCommandIT against a secure cluster where the user has no TGT: 2015-12-16 16:08:27,224 [main] DEBUG security.UserGroupInformation (loginUserFromSubject(825)) - UGI loginUser:stevel (auth:KERBEROS) 2015-12-16 16:08:27,283 [main] ERROR client.SliderClient (actionKDiag(3801)) - org.apache.hadoop.security.KerberosDiags$KerberosDiagsFailure: Login user: No kerberos credentials for stevel (auth:KERBEROS) 2015-12-16 16:08:27,284 [main] DEBUG client.SliderClient (actionKDiag(3802)) - org.apache.hadoop.security.KerberosDiags$KerberosDiagsFailure: Login user: No kerberos credentials for stevel (auth:KERBEROS) org.apache.hadoop.security.KerberosDiags$KerberosDiagsFailure: Login user: No kerberos credentials for stevel (auth:KERBEROS) at org.apache.hadoop.security.KerberosDiags.fail(KerberosDiags.java:297) at org.apache.hadoop.security.KerberosDiags.failif(KerberosDiags.java:303) at org.apache.hadoop.security.KerberosDiags.validateUser(KerberosDiags.java:289) at org.apache.hadoop.security.KerberosDiags.execute(KerberosDiags.java:195) at org.apache.slider.client.SliderClient.actionKDiag(SliderClient.java:3799) at org.apache.slider.client.SliderClient.exec(SliderClient.java:397) at org.apache.slider.client.SliderClient.runService(SliderClient.java:326) at org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:188) at org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:475) at org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:403) at org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.java:630) at org.apache.slider.Slider.main(Slider.java:49) </stdout> 2015-12-16 16:08:28,335 [JUnit] ERROR framework.ShellBase (NativeMethodAccessorImpl.java:invoke0(?)) - <stderr> Kerberos Diagnostics scan at Wed Dec 16 16:08:26 GMT 2015 System Properties java.security.krb5.conf = "(unset)" java.security.krb5.realm = "(unset)" sun.security.krb5.debug = "(unset)" sun.security.spnego.debug = "(unset)" Environment Variables HADOOP_JAAS_DEBUG = " true " KRB5CCNAME = "(unset)" HADOOP_USER_NAME = "(unset)" HADOOP_PROXY_USER = "(unset)" HADOOP_TOKEN_FILE_LOCATION = "(unset)" hadoop.kerberos.kinit.command = "kinit" hadoop.security.authentication = "kerberos" hadoop.security.authorization = " true " hadoop.security.dns. interface = "(unset)" hadoop.security.dns.nameserver = "(unset)" hadoop.ssl.enabled = " false " hadoop.rpc.protection = "authentication" hadoop.security.saslproperties.resolver.class = "(unset)" hadoop.security.crypto.codec.classes = "(unset)" hadoop.security.group.mapping = "org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback" Logging in Log in user UGI=stevel (auth:KERBEROS) Has kerberos credentials: false Authentication method: KERBEROS Real Authentication method: KERBEROS Group names staff access_bpf everyone localaccounts _appserverusr admin _appserveradm _lpadmin com.apple.access_ssh _appstore _lpoperator _developer com.apple.access_ftp com.apple.access_screensharing Credentials Secret keys (none) Tokens (none) Ticket based login: false Keytab based login: false </stderr> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 5.883 sec <<< FAILURE! - in org.apache.slider.funtest.commands.KDiagCommandIT testKdiag(org.apache.slider.funtest.commands.KDiagCommandIT) Time elapsed: 3.313 sec <<< ERROR! org.apache.slider.core.exceptions.SliderException: Expected exit code of command /Users/stevel/Projects/Hortonworks/Projects/slider/slider-assembly/target/slider-0.90.0-incubating-SNAPSHOT-all/slider-0.90.0-incubating-SNAPSHOT/bin/slider kdiag --fail : 0 - actual=41 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:80) at org.codehaus.groovy.reflection.CachedConstructor.doConstructorInvoke(CachedConstructor.java:74) at org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrap.callConstructor(ConstructorSite.java:84) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:60) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:235) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:255) at org.apache.slider.funtest.framework.ShellBase.assertExitCode(ShellBase.groovy:195) at org.apache.slider.funtest.framework.ShellBase.assertExitCode(ShellBase.groovy) at org.apache.slider.funtest.framework.CommandTestBase.assertExitCode(CommandTestBase.groovy:502) at org.apache.slider.funtest.framework.CommandTestBase.assertSuccess(CommandTestBase.groovy:485) at org.apache.slider.funtest.commands.KDiagCommandIT.testKdiag(KDiagCommandIT.groovy:36) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
          Hide
          stevel@apache.org Steve Loughran added a comment -

          There is now a diagnostics command. It still doesn't explain why ZK is playing up for me though

          Show
          stevel@apache.org Steve Loughran added a comment - There is now a diagnostics command. It still doesn't explain why ZK is playing up for me though
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit af6bd409a0478f785cdcd6a851937ffe1b10859c in incubator-slider's branch refs/heads/develop from Steve Loughran
          [ https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;h=af6bd40 ]

          SLIDER-1027 create KDiag command

          Show
          jira-bot ASF subversion and git services added a comment - Commit af6bd409a0478f785cdcd6a851937ffe1b10859c in incubator-slider's branch refs/heads/develop from Steve Loughran [ https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;h=af6bd40 ] SLIDER-1027 create KDiag command
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit a36b25d9c46346216c77ee204bae4cb19c060765 in incubator-slider's branch refs/heads/develop from Steve Loughran
          [ https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;h=a36b25d ]

          SLIDER-1027 add a kdiag command for kerberos diagnostics

          Show
          jira-bot ASF subversion and git services added a comment - Commit a36b25d9c46346216c77ee204bae4cb19c060765 in incubator-slider's branch refs/heads/develop from Steve Loughran [ https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;h=a36b25d ] SLIDER-1027 add a kdiag command for kerberos diagnostics
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 819b127e49fdadd832f6289b05b31fd98e1cee58 in incubator-slider's branch refs/heads/develop from Steve Loughran
          [ https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;h=819b127 ]

          SLIDER-1027 kdiag adds validation that user has kerberos credentials; doesn't fail on an insecure cluster, and adds a kdiagIT test to verify the kerberos bindings for the IT Test run

          Show
          jira-bot ASF subversion and git services added a comment - Commit 819b127e49fdadd832f6289b05b31fd98e1cee58 in incubator-slider's branch refs/heads/develop from Steve Loughran [ https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;h=819b127 ] SLIDER-1027 kdiag adds validation that user has kerberos credentials; doesn't fail on an insecure cluster, and adds a kdiagIT test to verify the kerberos bindings for the IT Test run
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ddca37b9f13430f50b5ade125d1c5461810aee6c in incubator-slider's branch refs/heads/develop from Steve Loughran
          [ https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;h=ddca37b ]

          Merge branch 'feature/SLIDER-1027-kdiag' into develop

          Show
          jira-bot ASF subversion and git services added a comment - Commit ddca37b9f13430f50b5ade125d1c5461810aee6c in incubator-slider's branch refs/heads/develop from Steve Loughran [ https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;h=ddca37b ] Merge branch 'feature/ SLIDER-1027 -kdiag' into develop

            People

            • Assignee:
              stevel@apache.org Steve Loughran
              Reporter:
              stevel@apache.org Steve Loughran
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development

                  Agile