Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-19930

The service check status was set to TIMEOUT even if service check was failed



    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None


      Steps to reproduce:

      • Install a cluster with Hadoop, Tez, Hbase , Hive, Spark
      • Enable Wire encryption
      • Run Tez service check

      Here, agent.service.check.task.timeout is set to 600 sec. Tez application was started in background. The service check then tries to find out SUCCESS file for couple of minutes only. In this particular instance, the application took 5 minutes to run. Thus, the check for SUCCESS file on HDFS failed.

      In this scenario, the status for service check should be failed instead Timeout.

      stderr:   /var/lib/ambari-agent/data/errors-370.txt
      stdout:   /var/lib/ambari-agent/data/output-370.txt
      2017-02-08 03:55:55,017 - HdfsResource['/hdp/apps/'] {'security_enabled': True, 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'keytab': '/etc/security/keytabs/hdfs.headless.keytab', 'source': '/usr/hdp/', 'dfs_type': '', 'default_fs': 'hdfs://host:8020', 'replace_existing_files': False, 'hdfs_resource_ignore_file': '/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ..., 'kinit_path_local': '/usr/bin/kinit', 'principal_name': 'hdfs@EXAMPLE.COM', 'user': 'hdfs', 'owner': 'hdfs', 'group': 'hadoop', 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type': 'file', 'action': ['create_on_execute'], 'immutable_paths': [u'/apps/hive/warehouse', u'/mr-history/done', u'/app-logs', u'/tmp'], 'mode': 0444}
      2017-02-08 03:55:55,017 - Execute['/usr/bin/kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs@EXAMPLE.COM'] {'user': 'hdfs'}
      2017-02-08 03:55:55,096 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X GET --negotiate -u : -k '"'"'https://host:50470/webhdfs/v1/hdp/apps/'"'"' 1>/tmp/tmpoIadeN 2>/tmp/tmp6nFiLj''] {'logoutput': None, 'quiet': False}
      2017-02-08 03:55:55,292 - call returned (0, '')
      2017-02-08 03:55:55,293 - DFS file /hdp/apps/ is identical to /usr/hdp/, skipping the copying
      2017-02-08 03:55:55,293 - Will attempt to copy tez tarball from /usr/hdp/ to DFS at /hdp/apps/
      2017-02-08 03:55:55,293 - HdfsResource[None] {'security_enabled': True, 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'keytab': '/etc/security/keytabs/hdfs.headless.keytab', 'dfs_type': '', 'default_fs': 'hdfs://host:8020', 'hdfs_resource_ignore_file': '/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ..., 'kinit_path_local': '/usr/bin/kinit', 'principal_name': 'hdfs@EXAMPLE.COM', 'user': 'hdfs', 'action': ['execute'], 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'immutable_paths': [u'/apps/hive/warehouse', u'/mr-history/done', u'/app-logs', u'/tmp']}
      2017-02-08 03:55:55,294 - Execute['/usr/bin/kinit -kt /etc/security/keytabs/smokeuser.headless.keytab ambari-qa-cl1@EXAMPLE.COM;'] {'user': 'ambari-qa'}
      2017-02-08 03:55:55,389 - ExecuteHadoop['jar /usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount /tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput/'] {'try_sleep': 5, 'tries': 3, 'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'user': 'ambari-qa', 'conf_dir': '/usr/hdp/current/hadoop-client/conf'}
      2017-02-08 03:55:55,390 - Execute['hadoop --config /usr/hdp/current/hadoop-client/conf jar /usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount /tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput/'] {'logoutput': None, 'try_sleep': 5, 'environment': {}, 'tries': 3, 'user': 'ambari-qa', 'path': ['/usr/hdp/current/hadoop-client/bin']}
      Requests: {
      aborted_task_count: 0,
      cluster_name: "cl1",
      completed_task_count: 1,
      create_time: 1486526151743,
      end_time: 1486526463038,
      exclusive: false,
      failed_task_count: 0,
      id: 29,
      inputs: "{}",
      operation_level: null,
      progress_percent: 100,
      queued_task_count: 0,
      request_context: "WE API TEZ Service Check",
      request_schedule: null,
      request_status: "TIMEDOUT",
      resource_filters: [
      service_name: "TEZ"
      start_time: 1486526151751,
      task_count: 1,
      timed_out_task_count: 1,
      type: "COMMAND"


        Issue Links



              mpapirkovskyy Papirkovskyy Myroslav
              yeshavora Yesha Vora
              0 Vote for this issue
              4 Start watching this issue