Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.2.2
-
None
Description
Build # - Ambari 2.2.1.1 - #63
Observed this issue in a couple of EU runs recently where YARN service check reports failure
a. In one test, the EU ran from HDP 2.3.4.0 to 2.4.0.0 and YARN service check reported failure during EU itself; a retry of the operation led to service check being successful
b. In another test post EU when YARN service check was run, it reported failure; afterwards when I ran it again - success
Looks like there is some corner condition which causes this issue to be hit
stderr: /var/lib/ambari-agent/data/errors-822.txt Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/service_check.py", line 142, in <module> ServiceCheck().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/service_check.py", line 104, in service_check user=params.smokeuser, File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call tries=tries, try_sleep=try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of '/usr/bin/kinit -kt /etc/security/keytabs/smokeuser.headless.keytab ambari-qa@EXAMPLE.COM; yarn org.apache.hadoop.yarn.applications.distributedshell.Client -shell_command ls -num_containers 1 -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar' returned 2. ######## Hortonworks ############# This is MOTD message, added for testing in qe infra 16/03/03 02:33:51 INFO impl.TimelineClientImpl: Timeline service address: http://host:8188/ws/v1/timeline/ 16/03/03 02:33:51 INFO distributedshell.Client: Initializing Client 16/03/03 02:33:51 INFO distributedshell.Client: Running Client 16/03/03 02:33:51 INFO client.RMProxy: Connecting to ResourceManager at host-9-5.test/127.0.0.254:8050 16/03/03 02:33:53 INFO distributedshell.Client: Got Cluster metric info from ASM, numNodeManagers=3 16/03/03 02:33:53 INFO distributedshell.Client: Got Cluster node info from ASM 16/03/03 02:33:53 INFO distributedshell.Client: Got node report from ASM for, nodeId=host:25454, nodeAddresshost:8042, nodeRackName/default-rack, nodeNumContainers1 16/03/03 02:33:53 INFO distributedshell.Client: Got node report from ASM for, nodeId=host-9-5.test:25454, nodeAddresshost-9-5.test:8042, nodeRackName/default-rack, nodeNumContainers0 16/03/03 02:33:53 INFO distributedshell.Client: Got node report from ASM for, nodeId=host-9-1.test:25454, nodeAddresshost-9-1.test:8042, nodeRackName/default-rack, nodeNumContainers0 16/03/03 02:33:53 INFO distributedshell.Client: Queue info, queueName=default, queueCurrentCapacity=0.083333336, queueMaxCapacity=1.0, queueApplicationCount=0, queueChildQueueCount=0 16/03/03 02:33:53 INFO distributedshell.Client: User ACL Info for Queue, queueName=root, userAcl=SUBMIT_APPLICATIONS 16/03/03 02:33:53 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=SUBMIT_APPLICATIONS 16/03/03 02:33:53 INFO distributedshell.Client: Max mem capabililty of resources in this cluster 10240 16/03/03 02:33:53 INFO distributedshell.Client: Max virtual cores capabililty of resources in this cluster 1 16/03/03 02:33:53 INFO distributedshell.Client: Copy App Master jar from local filesystem and add to local environment 16/03/03 02:33:53 INFO distributedshell.Client: Set the environment for the application master 16/03/03 02:33:53 INFO distributedshell.Client: Setting up app master command 16/03/03 02:33:53 INFO distributedshell.Client: Completed setting up app master command {{JAVA_HOME}}/bin/java -Xmx10m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 10 --container_vcores 1 --num_containers 1 --priority 0 1><LOG_DIR>/AppMaster.stdout 2><LOG_DIR>/AppMaster.stderr 16/03/03 02:33:53 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 290 for ambari-qa on 127.0.0.235:8020 16/03/03 02:33:53 INFO distributedshell.Client: Got dt for hdfs://host-9-1.test:8020; Kind: HDFS_DELEGATION_TOKEN, Service: 127.0.0.235:8020, Ident: (HDFS_DELEGATION_TOKEN token 290 for ambari-qa) 16/03/03 02:33:53 INFO distributedshell.Client: Submitting application to ASM 16/03/03 02:33:54 INFO impl.YarnClientImpl: Submitted application application_1456970141888_0011 16/03/03 02:33:55 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa 16/03/03 02:33:56 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa 16/03/03 02:33:57 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa 16/03/03 02:33:58 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa 16/03/03 02:33:59 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa 16/03/03 02:34:00 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa 16/03/03 02:34:01 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa 16/03/03 02:34:02 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa 16/03/03 02:34:03 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa 16/03/03 02:34:04 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=host-9-1/127.0.0.235, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa 16/03/03 02:34:05 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=host-9-1/127.0.0.235, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa 16/03/03 02:34:06 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=host-9-1/127.0.0.235, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa 16/03/03 02:34:07 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=host-9-1/127.0.0.235, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa 16/03/03 02:34:08 INFO distributedshell.Client: Got application report from ASM for, appId=11, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=host-9-1/127.0.0.235, appQueue=default, appMasterRpcPort=-1, appStartTime=1456972434150, yarnAppState=FINISHED, distributedFinalState=FAILED, appTrackingUrl=http://host-9-5.test:8088/proxy/application_1456970141888_0011/, appUser=ambari-qa 16/03/03 02:34:08 INFO distributedshell.Client: Application did finished unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring loop 16/03/03 02:34:08 ERROR distributedshell.Client: Application failed to complete successfully stdout: /var/lib/ambari-agent/data/output-822.txt 2016-03-03 02:33:47,974 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf 2016-03-03 02:33:48,013 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf 2016-03-03 02:33:48,018 - checked_call['/usr/bin/kinit -kt /etc/security/keytabs/smokeuser.headless.keytab ambari-qa@EXAMPLE.COM; yarn org.apache.hadoop.yarn.applications.distributedshell.Client -shell_command ls -num_containers 1 -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar'] {'path': '/usr/sbin:/sbin:/usr/local/bin:/bin:/usr/bin', 'user': 'ambari-qa'}
Attachments
Attachments
Issue Links
- links to