Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-19930

The service check status was set to TIMEOUT even if service check was failed

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      Steps to reproduce:

      • Install a cluster with Hadoop, Tez, Hbase , Hive, Spark
      • Enable Wire encryption
      • Run Tez service check

      Here, agent.service.check.task.timeout is set to 600 sec. Tez application was started in background. The service check then tries to find out SUCCESS file for couple of minutes only. In this particular instance, the application took 5 minutes to run. Thus, the check for SUCCESS file on HDFS failed.

      In this scenario, the status for service check should be failed instead Timeout.

      stderr:   /var/lib/ambari-agent/data/errors-370.txt
      
      stdout:   /var/lib/ambari-agent/data/output-370.txt
      
      2017-02-08 03:55:55,017 - HdfsResource['/hdp/apps/2.6.0.0-xxx/tez/tez.tar.gz'] {'security_enabled': True, 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'keytab': '/etc/security/keytabs/hdfs.headless.keytab', 'source': '/usr/hdp/2.6.0.0-xxx/tez/lib/tez.tar.gz', 'dfs_type': '', 'default_fs': 'hdfs://host:8020', 'replace_existing_files': False, 'hdfs_resource_ignore_file': '/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ..., 'kinit_path_local': '/usr/bin/kinit', 'principal_name': 'hdfs@EXAMPLE.COM', 'user': 'hdfs', 'owner': 'hdfs', 'group': 'hadoop', 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type': 'file', 'action': ['create_on_execute'], 'immutable_paths': [u'/apps/hive/warehouse', u'/mr-history/done', u'/app-logs', u'/tmp'], 'mode': 0444}
      2017-02-08 03:55:55,017 - Execute['/usr/bin/kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs@EXAMPLE.COM'] {'user': 'hdfs'}
      2017-02-08 03:55:55,096 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X GET --negotiate -u : -k '"'"'https://host:50470/webhdfs/v1/hdp/apps/2.6.0.0-xxx/tez/tez.tar.gz?op=GETFILESTATUS&user.name=hdfs'"'"' 1>/tmp/tmpoIadeN 2>/tmp/tmp6nFiLj''] {'logoutput': None, 'quiet': False}
      2017-02-08 03:55:55,292 - call returned (0, '')
      2017-02-08 03:55:55,293 - DFS file /hdp/apps/2.6.0.0-xxx/tez/tez.tar.gz is identical to /usr/hdp/2.6.0.0-xxx/tez/lib/tez.tar.gz, skipping the copying
      2017-02-08 03:55:55,293 - Will attempt to copy tez tarball from /usr/hdp/2.6.0.0-xxx/tez/lib/tez.tar.gz to DFS at /hdp/apps/2.6.0.0-xxx/tez/tez.tar.gz.
      2017-02-08 03:55:55,293 - HdfsResource[None] {'security_enabled': True, 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'keytab': '/etc/security/keytabs/hdfs.headless.keytab', 'dfs_type': '', 'default_fs': 'hdfs://host:8020', 'hdfs_resource_ignore_file': '/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ..., 'kinit_path_local': '/usr/bin/kinit', 'principal_name': 'hdfs@EXAMPLE.COM', 'user': 'hdfs', 'action': ['execute'], 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'immutable_paths': [u'/apps/hive/warehouse', u'/mr-history/done', u'/app-logs', u'/tmp']}
      2017-02-08 03:55:55,294 - Execute['/usr/bin/kinit -kt /etc/security/keytabs/smokeuser.headless.keytab ambari-qa-cl1@EXAMPLE.COM;'] {'user': 'ambari-qa'}
      2017-02-08 03:55:55,389 - ExecuteHadoop['jar /usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount /tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput/'] {'try_sleep': 5, 'tries': 3, 'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'user': 'ambari-qa', 'conf_dir': '/usr/hdp/current/hadoop-client/conf'}
      2017-02-08 03:55:55,390 - Execute['hadoop --config /usr/hdp/current/hadoop-client/conf jar /usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount /tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput/'] {'logoutput': None, 'try_sleep': 5, 'environment': {}, 'tries': 3, 'user': 'ambari-qa', 'path': ['/usr/hdp/current/hadoop-client/bin']}
      Requests: {
      aborted_task_count: 0,
      cluster_name: "cl1",
      completed_task_count: 1,
      create_time: 1486526151743,
      end_time: 1486526463038,
      exclusive: false,
      failed_task_count: 0,
      id: 29,
      inputs: "{}",
      operation_level: null,
      progress_percent: 100,
      queued_task_count: 0,
      request_context: "WE API TEZ Service Check",
      request_schedule: null,
      request_status: "TIMEDOUT",
      resource_filters: [
      {
      service_name: "TEZ"
      }
      ],
      start_time: 1486526151751,
      task_count: 1,
      timed_out_task_count: 1,
      type: "COMMAND"
      },

      Attachments

        Issue Links

          Activity

            People

              mpapirkovskyy Papirkovskyy Myroslav
              yeshavora Yesha Vora
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: