Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.4.0
-
None
-
Reviewed
Description
YARN-1697 fixed a problem where the NodeManager metrics could report a negative number of running containers. However, it missed a rare case where this can still happen.
YARN-1697 added a flag to indicate if the container was actually launched (LOCALIZED to RUNNING) or not (LOCALIZED to KILLING), which is then checked when transitioning from CONTAINER_CLEANEDUP_AFTER_KILL to DONE and EXITED_WITH_FAILURE to DONE to only decrement the gauge if we actually ran the container and incremented the gauge . However, this flag is not checked while transitioning from EXITED_WITH_SUCCESS to DONE.
Attachments
Attachments
Issue Links
- is related to
-
YARN-1697 NodeManager reports negative running containers
- Closed