Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-7284

Hadoop cluster alerts need updates for Hadoop 2.4 and 2.5

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0
    • None
    • None
    • None
    • Hadoop branch-2, release 2.4 or 2.5.

    Description

      Many /var/log/message alerts we keyed off of previously are no longer working or valid. It appears that many hadoop 1.x terms such as jobtracker, tasktracker and templeton still exist, when Ambari is being used with hadoop 2.x.
      I believe existing rules need to be modified for the following service name changes:
      resourcemanager_process_down
      resourcemanager_process_down_ok
      resourcemanager_rpc_latency
      resourcemanager_rpc_latency_ok
      resourcemanager_cpu_utilization
      resourcemanager_cpu_utilization_ok
      nodemanagers_down
      nodemanagers_down_ok
      nodemanager_process_down
      nodemanager_process_down_ok
      webhcat_down
      webhcat_down_ok

      It also appears that existing messages are getting improperly matched as we see the following HADOOP_UNKNOWN_MSG in /var/log/messages:
      Jul 15 10:36:34 pitH1 nagios[35331]: Warning: Hadoop: HADOOP_UNKNOWN_MSG# Event Host=pitH1.td.teradata.com Service Description=HDFS::Percent DataNodes with space available(WARNING), WARNING: total:6, affected:1

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            abaranchuk Artem Stepanovich Baranchuk Assign to me
            nicholas.yao Nicholas Yao
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment