Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-16075

MR service check failed during EU (Intermittent)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.4.0
    • ambari-server
    • None

    Description

      Seen in one of the EU runs today where MR service check reported below error during EU. A retry of the failed task was success and EU then proceeded to completion

      Traceback (most recent call last):\n  File \"/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/mapred_service_check.py\", line 160, in <module>\n    MapReduce2ServiceCheck().execute()\n  File \"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py\", line 219, in execute\n    method(env)\n  File \"/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/mapred_service_check.py\", line 155, in service_check\n    conf_dir=params.hadoop_conf_dir\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/base.py\", line 154, in __init__\n    self.env.run()\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/environment.py\", line 160, in run\n    self.run_action(resource, action)\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/environment.py\", line 124, in run_action\n    provider_action()\n  File \"/usr/lib/python2.6/site-packages/resource_management/libraries/providers/execute_hadoop.py\", line 54, in action_run\n    environment = self.resource.environment,\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/base.py\", line 154, in __init__\n    self.env.run()\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/environment.py\", line 160, in run\n    self.run_action(resource, action)\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/environment.py\", line 124, in run_action\n    provider_action()\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py\", line 238, in action_run\n    tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/shell.py\", line 70, in inner\n    result = function(command, **kwargs)\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/shell.py\", line 92, in checked_call\n    tries=tries, try_sleep=try_sleep)\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/shell.py\", line 140, in _call_wrapper\n    result = _call(command, **kwargs_copy)\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/shell.py\", line 291, in _call\n    raise Fail(err_msg)\nresource_management.core.exceptions.Fail: Execution of 'hadoop --config /usr/hdp/2.4.2.0-243/hadoop/conf fs -test -e /user/ambari-qa/mapredsmokeoutput' returned 1. ######## Hortonworks #############\nThis is MOTD message, added for testing in qe infra\n16/04/22 02:34:01 WARN ipc.Client: Exception encountered while connecting to the server : \njavax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]\n\tat com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)\n\tat org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:413)\n\tat org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:563)\n\tat org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:378)\n\tat org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:732)\n\tat org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:728)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)\n\tat org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:727)\n\tat org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378)\n\tat org.apache.hadoop.ipc.Client.getConnection(Client.java:1492)\n\tat org.apache.hadoop.ipc.Client.call(Client.java:1402)\n\tat org.apache.hadoop.ipc.Client.call(Client.java:1363)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)\n\tat com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:773)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)\n\tat org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)\n\tat com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)\n\tat org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2162)\n\tat org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1363)\n\tat org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1359)\n\tat org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)\n\tat org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1359)\n\tat org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)\n\tat org.apache.hadoop.fs.Globber.glob(Globber.java:252)\n\tat org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1655)\n\tat org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:326)\n\tat org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:235)\n\tat org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:218)\n\tat org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)\n\tat org.apache.hadoop.fs.shell.Command.run(Command.java:165)\n\tat org.apache.hadoop.fs.FsShell.run(FsShell.java:287)\n\tat org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)\n\tat org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)\n\tat org.apache.hadoop.fs.FsShell.main(FsShell.java:340)\nCaused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)\n\tat sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)\n\tat sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)\n\tat sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)\n\tat sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)\n\tat sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)\n\tat sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)\n\tat com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)\n\t... 40 more\ntest: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: \"os-r6-mzzdsu-ambari-se-eu-9-4.novalocal/172.22.79.206\"; destination host is: \"os-r6-mzzdsu-ambari-se-eu-9-3.novalocal\":8020;
      

      EU path from 2.4.0.0 to 2.4.2.0-243

      Attachments

        1. AMBARI-16075.patch
          4 kB
          Nate Cole

        Issue Links

          Activity

            People

              ncole@hortonworks.com Nate Cole
              ncole@hortonworks.com Nate Cole
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: