Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-23478

YARN Cluster CPU Usage Graph Always Shows High CPU Usage

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.5.0
    • 2.8.0, 2.7.4
    • None

    Description

      ISSUE

      In Ambari, YARN's Cluster CPU widget always shows relatively high CPU usage, when NodeManager in a cluster is more than one.


      (started another node at around 19:00)

      REPRODUCE STEPS

      1. Install a cluster with one NodeManager and AMS.
      2. Confirm "Cluster CPU" widget looks OK
      3. Add one more node with NodeManager, and wait for a while

      INVESTIGATION

      AMS side looks OK

      curl -s -k http://sandbox-hdp.hortonworks.com:6188/ws/v1/timeline/metrics -G --data-urlencode metricNames=cpu_idle._sum --data-urlencode appId=NODEMANAGER --data-urlencode startTime=1521454794 --data-urlencode endTime=1521455394 --data-urlencode precision=MINUTES 
      ...
      {
          "metrics": [
              {
                  "appid": "nodemanager",
                  "metadata": {},
                  "metricname": "cpu_idle._sum",
                  "metrics": {
                      "1521454800000": 198.99000000000001,
                      "1521455100000": 192.56999999999999
                  },
                  "starttime": 1521454800000,
                  "timestamp": 1521454800000
              }
          ]
      }
      

      But via Ambari, cpu_idle._sum becomes 100 times smaller

      curl -s -k -u admin:admin http://sandbox-hdp.hortonworks.com:8080/api/v1/clusters/Sandbox/services/YARN/components/NODEMANAGER -G --data-urlencode 'fields=metrics/cpu/cpu_idle._sum[1521454950,1521455550,15]'
      ...(snip)...
        "metrics" : {
          "cpu" : {
            "cpu_idle._sum" : [
              [
                1.8686666666666667,
                1521454950
              ],
              [
                1.9843333333333333,
                1521454980
              ],
              [
                1.9,
                1521455010
              ],
              [
                1.9846666666666664,
                1521455040
              ],
              [
                1.8926666666666665,
                1521455070
              ],
      ...(snip)...
      

      Somehow 'cpu_idle._sum' is always wrong for this Widget:

      curl -s -k -u admin:admin http://sandbox-hdp.hortonworks.com:8080/api/v1/clusters/Sandbox/services/YARN/components/NODEMANAGER -G --data-urlencode 'fields=metrics/cpu/cpu_nice._sum[1521196167,1521199767,15],metrics/cpu/cpu_idle._avg[1521196167,1521199767,15],metrics/cpu/cpu_wio._sum[1521196167,1521199767,15],metrics/cpu/cpu_idle._sum[1521196167,1521199767,15],metrics/cpu/cpu_user._sum[1521196167,1521199767,15],metrics/cpu/cpu_system._sum[1521196167,1521199767,15]' -o ./ambari_NODEMANAGER_metrics.json
      
      [root@sandbox-hdp ~]# grep -E -B1 '"cpu_|1521450000' ambari_NODEMANAGER_metrics.json | grep -vE -- '(--|\],)'
          "cpu" : {
            "cpu_idle._avg" : [
                85.54999999999998,
                1521199500
            "cpu_idle._sum" : [
                1.7109999999999996,     <<< need to multiply 100
                1521199500
            "cpu_nice._sum" : [
                0.0,
                1521199500
            "cpu_system._sum" : [
                21.900000000000002,
                1521199500
            "cpu_user._sum" : [
                6.666666666666666,
                1521199500
            "cpu_wio._sum" : [
                0.2,
                1521199500
      

      Attachments

        1. image-2018-03-19-20-26-44-325.png
          202 kB
          Jonathan Hurley
        2. image-2018-03-19-20-27-19-160.png
          65 kB
          Jonathan Hurley

        Activity

          People

            kkasa Krisztian Kasa
            jonathanhurley Jonathan Hurley
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h
                1h