Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
None
Description
Steps to reproduce:
- Install a cluster with Hadoop, Tez, Hbase , Hive, Spark
- Enable Wire encryption
- Run Tez service check
Here, agent.service.check.task.timeout is set to 600 sec. Tez application was started in background. The service check then tries to find out SUCCESS file for couple of minutes only. In this particular instance, the application took 5 minutes to run. Thus, the check for SUCCESS file on HDFS failed.
In this scenario, the status for service check should be failed instead Timeout.
stderr: /var/lib/ambari-agent/data/errors-370.txt stdout: /var/lib/ambari-agent/data/output-370.txt 2017-02-08 03:55:55,017 - HdfsResource['/hdp/apps/2.6.0.0-xxx/tez/tez.tar.gz'] {'security_enabled': True, 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'keytab': '/etc/security/keytabs/hdfs.headless.keytab', 'source': '/usr/hdp/2.6.0.0-xxx/tez/lib/tez.tar.gz', 'dfs_type': '', 'default_fs': 'hdfs://host:8020', 'replace_existing_files': False, 'hdfs_resource_ignore_file': '/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ..., 'kinit_path_local': '/usr/bin/kinit', 'principal_name': 'hdfs@EXAMPLE.COM', 'user': 'hdfs', 'owner': 'hdfs', 'group': 'hadoop', 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type': 'file', 'action': ['create_on_execute'], 'immutable_paths': [u'/apps/hive/warehouse', u'/mr-history/done', u'/app-logs', u'/tmp'], 'mode': 0444} 2017-02-08 03:55:55,017 - Execute['/usr/bin/kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs@EXAMPLE.COM'] {'user': 'hdfs'} 2017-02-08 03:55:55,096 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X GET --negotiate -u : -k '"'"'https://host:50470/webhdfs/v1/hdp/apps/2.6.0.0-xxx/tez/tez.tar.gz?op=GETFILESTATUS&user.name=hdfs'"'"' 1>/tmp/tmpoIadeN 2>/tmp/tmp6nFiLj''] {'logoutput': None, 'quiet': False} 2017-02-08 03:55:55,292 - call returned (0, '') 2017-02-08 03:55:55,293 - DFS file /hdp/apps/2.6.0.0-xxx/tez/tez.tar.gz is identical to /usr/hdp/2.6.0.0-xxx/tez/lib/tez.tar.gz, skipping the copying 2017-02-08 03:55:55,293 - Will attempt to copy tez tarball from /usr/hdp/2.6.0.0-xxx/tez/lib/tez.tar.gz to DFS at /hdp/apps/2.6.0.0-xxx/tez/tez.tar.gz. 2017-02-08 03:55:55,293 - HdfsResource[None] {'security_enabled': True, 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'keytab': '/etc/security/keytabs/hdfs.headless.keytab', 'dfs_type': '', 'default_fs': 'hdfs://host:8020', 'hdfs_resource_ignore_file': '/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ..., 'kinit_path_local': '/usr/bin/kinit', 'principal_name': 'hdfs@EXAMPLE.COM', 'user': 'hdfs', 'action': ['execute'], 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'immutable_paths': [u'/apps/hive/warehouse', u'/mr-history/done', u'/app-logs', u'/tmp']} 2017-02-08 03:55:55,294 - Execute['/usr/bin/kinit -kt /etc/security/keytabs/smokeuser.headless.keytab ambari-qa-cl1@EXAMPLE.COM;'] {'user': 'ambari-qa'} 2017-02-08 03:55:55,389 - ExecuteHadoop['jar /usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount /tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput/'] {'try_sleep': 5, 'tries': 3, 'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'user': 'ambari-qa', 'conf_dir': '/usr/hdp/current/hadoop-client/conf'} 2017-02-08 03:55:55,390 - Execute['hadoop --config /usr/hdp/current/hadoop-client/conf jar /usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount /tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput/'] {'logoutput': None, 'try_sleep': 5, 'environment': {}, 'tries': 3, 'user': 'ambari-qa', 'path': ['/usr/hdp/current/hadoop-client/bin']}
Requests: { aborted_task_count: 0, cluster_name: "cl1", completed_task_count: 1, create_time: 1486526151743, end_time: 1486526463038, exclusive: false, failed_task_count: 0, id: 29, inputs: "{}", operation_level: null, progress_percent: 100, queued_task_count: 0, request_context: "WE API TEZ Service Check", request_schedule: null, request_status: "TIMEDOUT", resource_filters: [ { service_name: "TEZ" } ], start_time: 1486526151751, task_count: 1, timed_out_task_count: 1, type: "COMMAND" },
Attachments
Issue Links
- is duplicated by
-
AMBARI-19946 Task status should be set to ABORTED when heartbeat is lost
- Resolved
- links to