Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.5.0
-
None
Description
ISSUE
In Ambari, YARN's Cluster CPU widget always shows relatively high CPU usage, when NodeManager in a cluster is more than one.
(started another node at around 19:00)
REPRODUCE STEPS
- Install a cluster with one NodeManager and AMS.
- Confirm "Cluster CPU" widget looks OK
- Add one more node with NodeManager, and wait for a while
INVESTIGATION
AMS side looks OK
curl -s -k http://sandbox-hdp.hortonworks.com:6188/ws/v1/timeline/metrics -G --data-urlencode metricNames=cpu_idle._sum --data-urlencode appId=NODEMANAGER --data-urlencode startTime=1521454794 --data-urlencode endTime=1521455394 --data-urlencode precision=MINUTES ... { "metrics": [ { "appid": "nodemanager", "metadata": {}, "metricname": "cpu_idle._sum", "metrics": { "1521454800000": 198.99000000000001, "1521455100000": 192.56999999999999 }, "starttime": 1521454800000, "timestamp": 1521454800000 } ] }
But via Ambari, cpu_idle._sum becomes 100 times smaller
curl -s -k -u admin:admin http://sandbox-hdp.hortonworks.com:8080/api/v1/clusters/Sandbox/services/YARN/components/NODEMANAGER -G --data-urlencode 'fields=metrics/cpu/cpu_idle._sum[1521454950,1521455550,15]' ...(snip)... "metrics" : { "cpu" : { "cpu_idle._sum" : [ [ 1.8686666666666667, 1521454950 ], [ 1.9843333333333333, 1521454980 ], [ 1.9, 1521455010 ], [ 1.9846666666666664, 1521455040 ], [ 1.8926666666666665, 1521455070 ], ...(snip)...
Somehow 'cpu_idle._sum' is always wrong for this Widget:
curl -s -k -u admin:admin http://sandbox-hdp.hortonworks.com:8080/api/v1/clusters/Sandbox/services/YARN/components/NODEMANAGER -G --data-urlencode 'fields=metrics/cpu/cpu_nice._sum[1521196167,1521199767,15],metrics/cpu/cpu_idle._avg[1521196167,1521199767,15],metrics/cpu/cpu_wio._sum[1521196167,1521199767,15],metrics/cpu/cpu_idle._sum[1521196167,1521199767,15],metrics/cpu/cpu_user._sum[1521196167,1521199767,15],metrics/cpu/cpu_system._sum[1521196167,1521199767,15]' -o ./ambari_NODEMANAGER_metrics.json [root@sandbox-hdp ~]# grep -E -B1 '"cpu_|1521450000' ambari_NODEMANAGER_metrics.json | grep -vE -- '(--|\],)' "cpu" : { "cpu_idle._avg" : [ 85.54999999999998, 1521199500 "cpu_idle._sum" : [ 1.7109999999999996, <<< need to multiply 100 1521199500 "cpu_nice._sum" : [ 0.0, 1521199500 "cpu_system._sum" : [ 21.900000000000002, 1521199500 "cpu_user._sum" : [ 6.666666666666666, 1521199500 "cpu_wio._sum" : [ 0.2, 1521199500