Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-12622

Malformed Alert Data Can Prevent Alerts From Reporting

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.0.0
    • 2.1.1
    • ambari-agent
    • None

    Description

      If there is malformed template text in an alert definition, it will prevent alerts from being reported correctly:

      INFO 2015-07-23 14:10:15,209 scheduler.py:287 - Adding job tentatively -- it will be properly scheduled when the scheduler starts
      INFO 2015-07-23 14:10:15,209 AlertSchedulerHandler.py:318 - [AlertScheduler] Scheduling datanode_process with UUID 43536b17-596a-4f7d-87e6-c9034b2b99bc
      INFO 2015-07-23 14:10:15,209 AlertSchedulerHandler.py:134 - [AlertScheduler] Starting <ambari_agent.apscheduler.scheduler.Scheduler object at 0x10ea910>; currently running: False
      INFO 2015-07-23 14:10:17,219 hostname.py:87 - Read public hostname 'cn105-10.l42scl.hortonworks.com' using socket.getfqdn()
      INFO 2015-07-23 14:11:15,271 logger.py:66 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf/conf.server' ; hive --hiveconf hive.metastore.uris=thrift://cn105-10.l42scl.hortonworks.com:9083                 --hiveconf hive.metastore.client.connect.retry.delay=1                 --hiveconf hive.metastore.failure.retries=1                 --hiveconf hive.metastore.connect.retries=1                 --hiveconf hive.metastore.client.socket.timeout=14                 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', '/usr/hdp/current/hive-metastore/bin'], 'user': 'ambari-qa', 'timeout': 30}
      INFO 2015-07-23 14:11:15,274 logger.py:66 - Execute['! beeline -u 'jdbc:hive2://cn105-10.l42scl.hortonworks.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'user': 'ambari-qa', 'timeout': 30}
      INFO 2015-07-23 14:12:15,277 logger.py:66 - Execute['! beeline -u 'jdbc:hive2://cn105-10.l42scl.hortonworks.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'user': 'ambari-qa', 'timeout': 30}
      INFO 2015-07-23 14:12:15,284 logger.py:66 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf/conf.server' ; hive --hiveconf hive.metastore.uris=thrift://cn105-10.l42scl.hortonworks.com:9083                 --hiveconf hive.metastore.client.connect.retry.delay=1                 --hiveconf hive.metastore.failure.retries=1                 --hiveconf hive.metastore.connect.retries=1                 --hiveconf hive.metastore.client.socket.timeout=14                 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', '/usr/hdp/current/hive-metastore/bin'], 'user': 'ambari-qa', 'timeout': 30}
      ERROR 2015-07-23 14:12:15,293 scheduler.py:520 - Job "28cad184-0e94-4b19-af31-56ab2c1d0a74 (trigger: interval[0:02:00], next run at: 2015-07-23 14:14:15.207467)" raised an exception
      Traceback (most recent call last):
        File "/usr/lib/python2.6/site-packages/ambari_agent/apscheduler/scheduler.py", line 512, in _run_job
          retval = job.func(*job.args, **job.kwargs)
        File "/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py", line 114, in <lambda>
          return lambda: alert_def.collect()
        File "/usr/lib/python2.6/site-packages/ambari_agent/alerts/base_alert.py", line 153, in collect
          data['text'] = res_base_text.format(*res[1])
      ValueError: Unknown format code 'd' for object of type 'float'
      INFO 2015-07-23 14:13:15,264 logger.py:66 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf/conf.server' ; hive --hiveconf hive.metastore.uris=thrift://cn105-10.l42scl.hortonworks.com:9083                 --hiveconf hive.metastore.client.connect.retry.delay=1                 --hiveconf hive.metastore.failure.retries=1                 --hiveconf hive.metastore.connect.retries=1                 --hiveconf hive.metastore.client.socket.timeout=14                 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', '/usr/hdp/current/hive-metastore/bin'], 'user': 'ambari-qa', 'timeout': 30}
      INFO 2015-07-23 14:13:15,272 logger.py:66 - Execute['! beeline -u 'jdbc:hive2://cn105-10.l42scl.hortonworks.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'user': 'ambari-qa', 'timeout': 30}
      ERROR 2015-07-23 14:14:15,273 scheduler.py:520 - Job "28cad184-0e94-4b19-af31-56ab2c1d0a74 (trigger: interval[0:02:00], next run at: 2015-07-23 14:16:15.207467)" raised an exception
      Traceback (most recent call last):
        File "/usr/lib/python2.6/site-packages/ambari_agent/apscheduler/scheduler.py", line 512, in _run_job
          retval = job.func(*job.args, **job.kwargs)
        File "/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py", line 114, in <lambda>
          return lambda: alert_def.collect()
        File "/usr/lib/python2.6/site-packages/ambari_agent/alerts/base_alert.py", line 153, in collect
          data['text'] = res_base_text.format(*res[1])
      ValueError: Unknown format code 'd' for object of type 'float'
      INFO 2015-07-23 14:14:15,272 logger.py:66 - Execute['! beeline -u 'jdbc:hive2://cn105-10.l42scl.hortonworks.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'user': 'ambari-qa', 'timeout': 30}
      INFO 2015-07-23 14:14:15,277 logger.py:66 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf/conf.server' ; hive --hiveconf hive.metastore.uris=thrift://cn105-10.l42scl.hortonworks.com:9083                 --hiveconf hive.metastore.client.connect.retry.delay=1                 --hiveconf hive.metastore.failure.retries=1                 --hiveconf hive.metastore.connect.retries=1                 --hiveconf hive.metastore.client.socket.timeout=14                 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', '/usr/hdp/current/hive-metastore/bin'], 'user': 'ambari-qa', 'timeout': 30}
      INFO 2015-07-23 14:15:15,267 logger.py:66 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf/conf.server' ; hive --hiveconf hive.metastore.uris=thrift://cn105-10.l42scl.hortonworks.com:9083                 --hiveconf hive.metastore.client.connect.retry.delay=1                 --hiveconf hive.metastore.failure.retries=1                 --hiveconf hive.metastore.connect.retries=1                 --hiveconf hive.metastore.client.socket.timeout=14                 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', '/usr/hdp/current/hive-metastore/bin'], 'user': 'ambari-qa', 'timeout': 30}
      INFO 2015-07-23 14:15:15,275 logger.py:66 - Execute['! beeline -u 'jdbc:hive2://cn105-10.l42scl.hortonworks.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'user': 'ambari-qa', 'timeout': 30}
      ERROR 2015-07-23 14:16:15,269 scheduler.py:520 - Job "28cad184-0e94-4b19-af31-56ab2c1d0a74 (trigger: interval[0:02:00], next run at: 2015-07-23 14:18:15.207467)" raised an exception
      Traceback (most recent call last):
        File "/usr/lib/python2.6/site-packages/ambari_agent/apscheduler/scheduler.py", line 512, in _run_job
          retval = job.func(*job.args, **job.kwargs)
        File "/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py", line 114, in <lambda>
          return lambda: alert_def.collect()
        File "/usr/lib/python2.6/site-packages/ambari_agent/alerts/base_alert.py", line 153, in collect
          data['text'] = res_base_text.format(*res[1])
      ValueError: Unknown format code 'd' for object of type 'float'
      INFO 2015-07-23 14:16:15,273 logger.py:66 - Execute['! beeline -u 'jdbc:hive2://cn105-10.l42scl.hortonworks.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'user': 'ambari-qa', 'timeout': 30}
      INFO 2015-07-23 14:16:15,281 logger.py:66 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf/conf.server' ; hive --hiveconf hive.metastore.uris=thrift://cn105-10.l42scl.hortonworks.com:9083                 --hiveconf hive.metastore.client.connect.retry.delay=1                 --hiveconf hive.metastore.failure.retries=1                 --hiveconf hive.metastore.connect.retries=1                 --hiveconf hive.metastore.client.socket.timeout=14                 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', '/usr/hdp/current/hive-metastore/bin'], 'user': 'ambari-qa', 'timeout': 30}
      INFO 2015-07-23 14:17:15,262 logger.py:66 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf/conf.server' ; hive --hiveconf hive.metastore.uris=thrift://cn105-10.l42scl.hortonworks.com:9083                 --hiveconf hive.metastore.client.connect.retry.delay=1                 --hiveconf hive.metastore.failure.retries=1                 --hiveconf hive.metastore.connect.retries=1                 --hiveconf hive.metastore.client.socket.timeout=14                 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', '/usr/hdp/current/hive-metastore/bin'], 'user': 'ambari-qa', 'timeout': 30}
      INFO 2015-07-23 14:17:15,268 logger.py:66 - Execute['! beeline -u 'jdbc:hive2://cn105-10.l42scl.hortonworks.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'user': 'ambari-qa', 'timeout': 30}
      ERROR 2015-07-23 14:18:15,269 scheduler.py:520 - Job "28cad184-0e94-4b19-af31-56ab2c1d0a74 (trigger: interval[0:02:00], next run at: 2015-07-23 14:20:15.207467)" raised an exception
      Traceback (most recent call last):
        File "/usr/lib/python2.6/site-packages/ambari_agent/apscheduler/scheduler.py", line 512, in _run_job
          retval = job.func(*job.args, **job.kwargs)
        File "/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py", line 114, in <lambda>
          return lambda: alert_def.collect()
        File "/usr/lib/python2.6/site-packages/ambari_agent/alerts/base_alert.py", line 153, in collect
          data['text'] = res_base_text.format(*res[1])
      ValueError: Unknown format code 'd' for object of type 'float'
      INFO 2015-07-23 14:18:15,272 logger.py:66 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf/conf.server' ; hive --hiveconf hive.metastore.uris=thrift://cn105-10.l42scl.hortonworks.com:9083                 --hiveconf hive.metastore.client.connect.retry.delay=1                 --hiveconf hive.metastore.failure.retries=1                 --hiveconf hive.metastore.connect.retries=1                 --hiveconf hive.metastore.client.socket.timeout=14                 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', '/usr/hdp/current/hive-metastore/bin'], 'user': 'ambari-qa', 'timeout': 30}
      INFO 2015-07-23 14:18:15,273 logger.py:66 - Execute['! beeline -u 'jdbc:hive2://cn105-10.l42scl.hortonworks.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'user': 'ambari-qa', 'timeout': 30}
      INFO 2015-07-23 14:19:15,265 logger.py:66 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf/conf.server' ; hive --hiveconf hive.metastore.uris=thrift://cn105-10.l42scl.hortonworks.com:9083                 --hiveconf hive.metastore.client.connect.retry.delay=1                 --hiveconf hive.metastore.failure.retries=1                 --hiveconf hive.metastore.connect.retries=1                 --hiveconf hive.metastore.client.socket.timeout=14                 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', '/usr/hdp/current/hive-metastore/bin'], 'user': 'ambari-qa', 'timeout': 30}
      INFO 2015-07-23 14:19:15,267 logger.py:66 - Execute['! beeline -u 'jdbc:hive2://cn105-10.l42scl.hortonworks.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'user': 'ambari-qa', 'timeout': 30}
      ERROR 2015-07-23 14:20:15,277 scheduler.py:520 - Job "28cad184-0e94-4b19-af31-56ab2c1d0a74 (trigger: interval[0:02:00], next run at: 2015-07-23 14:22:15.207467)" raised an exception
      Traceback (most recent call last):
        File "/usr/lib/python2.6/site-packages/ambari_agent/apscheduler/scheduler.py", line 512, in _run_job
          retval = job.func(*job.args, **job.kwargs)
        File "/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py", line 114, in <lambda>
          return lambda: alert_def.collect()
        File "/usr/lib/python2.6/site-packages/ambari_agent/alerts/base_alert.py", line 153, in collect
          data['text'] = res_base_text.format(*res[1])
      ValueError: Unknown format code 'd' for object of type 'float'
      INFO 2015-07-23 14:20:15,280 logger.py:66 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf/conf.server' ; hive --hiveconf hive.metastore.uris=thrift://cn105-10.l42scl.hortonworks.com:9083                 --hiveconf hive.metastore.client.connect.retry.delay=1                 --hiveconf hive.metastore.failure.retries=1                 --hiveconf hive.metastore.connect.retries=1                 --hiveconf hive.metastore.client.socket.timeout=14                 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', '/usr/hdp/current/hive-metastore/bin'], 'user': 'ambari-qa', 'timeout': 30}
      INFO 2015-07-23 14:20:15,289 logger.py:66 - Execute['! beeline -u 'jdbc:hive2://cn105-10.l42scl.hortonworks.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'user': 'ambari-qa', 'timeout': 30}
      INFO 2015-07-23 14:21:15,266 logger.py:66 - Execute['! beeline -u 'jdbc:hive2://cn105-10.l42scl.hortonworks.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'user': 'ambari-qa', 'timeout': 30}
      INFO 2015-07-23 14:21:15,272 logger.py:66 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf/conf.server' ; hive --hiveconf hive.metastore.uris=thrift://cn105-10.l42scl.hortonworks.com:9083                 --hiveconf hive.metastore.client.connect.retry.delay=1                 --hiveconf hive.metastore.failure.retries=1                 --hiveconf hive.metastore.connect.retries=1                 --hiveconf hive.metastore.client.socket.timeout=14                 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', '/usr/hdp/current/hive-metastore/bin'], 'user': 'ambari-qa', 'timeout': 30}
      ERROR 2015-07-23 14:22:15,270 scheduler.py:520 - Job "28cad184-0e94-4b19-af31-56ab2c1d0a74 (trigger: interval[0:02:00], next run at: 2015-07-23 14:24:15.207467)" raised an exception
      Traceback (most recent call last):
        File "/usr/lib/python2.6/site-packages/ambari_agent/apscheduler/scheduler.py", line 512, in _run_job
          retval = job.func(*job.args, **job.kwargs)
        File "/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py", line 114, in <lambda>
          return lambda: alert_def.collect()
        File "/usr/lib/python2.6/site-packages/ambari_agent/alerts/base_alert.py", line 153, in collect
          data['text'] = res_base_text.format(*res[1])
      ValueError: Unknown format code 'd' for object of type 'float'
      INFO 2015-07-23 14:22:15,275 logger.py:66 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf/conf.server' ; hive --hiveconf hive.metastore.uris=thrift://cn105-10.l42scl.hortonworks.com:9083                 --hiveconf hive.metastore.client.connect.retry.delay=1                 --hiveconf hive.metastore.failure.retries=1                 --hiveconf hive.metastore.connect.retries=1                 --hiveconf hive.metastore.client.socket.timeout=14                 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', '/usr/hdp/current/hive-metastore/bin'], 'user': 'ambari-qa', 'timeout': 30}
      INFO 2015-07-23 14:22:15,276 logger.py:66 - Execute['! beeline -u 'jdbc:hive2://cn105-10.l42scl.hortonworks.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'user': 'ambari-qa', 'timeout': 30}
      INFO 2015-07-23 14:23:15,267 logger.py:66 - Execute['! beeline -u 'jdbc:hive2://cn105-10.l42scl.hortonworks.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'user': 'ambari-qa', 'timeout': 30}
      INFO 2015-07-23 14:23:15,276 logger.py:66 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf/conf.server' ; hive --hiveconf hive.metastore.uris=thrift://cn105-10.l42scl.hortonworks.com:9083                 --hiveconf hive.metastore.client.connect.retry.delay=1                 --hiveconf hive.metastore.failure.retries=1                 --hiveconf hive.metastore.connect.retries=1                 --hiveconf hive.metastore.client.socket.timeout=14                 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', '/usr/hdp/current/hive-metastore/bin'], 'user': 'ambari-qa', 'timeout': 30}
      ERROR 2015-07-23 14:24:15,270 scheduler.py:520 - Job "28cad184-0e94-4b19-af31-56ab2c1d0a74 (trigger: interval[0:02:00], next run at: 2015-07-23 14:26:15.207467)" raised an exception
      Traceback (most recent call last):
        File "/usr/lib/python2.6/site-packages/ambari_agent/apscheduler/scheduler.py", line 512, in _run_job
          retval = job.func(*job.args, **job.kwargs)
        File "/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py", line 114, in <lambda>
          return lambda: alert_def.collect()
        File "/usr/lib/python2.6/site-packages/ambari_agent/alerts/base_alert.py", line 153, in collect
          data['text'] = res_base_text.format(*res[1])
      ValueError: Unknown format code 'd' for object of type 'float'
      INFO 2015-07-23 14:24:15,274 logger.py:66 - Execute['export HIVE_CONF_DIR='/usr/hdp/current/hive-metastore/conf/conf.server' ; hive --hiveconf hive.metastore.uris=thrift://cn105-10.l42scl.hortonworks.com:9083                 --hiveconf hive.metastore.client.connect.retry.delay=1                 --hiveconf hive.metastore.failure.retries=1                 --hiveconf hive.metastore.connect.retries=1                 --hiveconf hive.metastore.client.socket.timeout=14                 --hiveconf hive.execution.engine=mr -e 'show databases;''] {'path': ['/bin/', '/usr/bin/', '/usr/sbin/', '/usr/hdp/current/hive-metastore/bin'], 'user': 'ambari-qa', 'timeout': 30}
      INFO 2015-07-23 14:24:15,282 logger.py:66 - Execute['! beeline -u 'jdbc:hive2://cn105-10.l42scl.hortonworks.com:10000/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL''] {'path': ['/bin/', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'user': 'ambari-qa', 'timeout': 30}
      

      Attachments

        1. AMBARI-12622.patch
          11 kB
          Jonathan Hurley

        Issue Links

          Activity

            People

              jonathanhurley Jonathan Hurley
              jonathanhurley Jonathan Hurley
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: