Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6715

Fix documentation about NodeHealthScriptRunner

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      NodeHealthScriptRunner does not report a bad health if the script exits with an exit code other than 0. Look at the FAILED_WITH_EXIT_CODE case:

          void reportHealthStatus(HealthCheckerExitStatus status) {
            long now = System.currentTimeMillis();
            switch (status) {
            case SUCCESS:
              setHealthStatus(true, "", now);
              break;
            case TIMED_OUT:
              setHealthStatus(false, NODE_HEALTH_SCRIPT_TIMED_OUT_MSG);
              break;
            case FAILED_WITH_EXCEPTION:
              setHealthStatus(false, exceptionStackTrace);
              break;
            case FAILED_WITH_EXIT_CODE:
              setHealthStatus(true, "", now);
              break;
            case FAILED:
              setHealthStatus(false, shexec.getOutput());
              break;
            }
          }
      

      Based on the discussion in YARN-5567, this is intentional, but conflicts with the upstream document, which says:
      "If the script exits with a non-zero exit code, times out or results in an exception being thrown, the node is marked as unhealthy"

      This statement can be extremely misleading and must be corrected. We might also add an extra comment to reportHealthStatus() which explains that FAILED_WITH_EXIT_CODE is not buggy.

      This case also lacks unit test coverage.

      Attachments

        1. YARN-6715-001.patch
          5 kB
          Peter Bacsko
        2. YARN-6715-002.patch
          5 kB
          Peter Bacsko
        3. YARN-6715-003.patch
          4 kB
          Peter Bacsko
        4. YARN-6715-branch-3.1.001.patch
          4 kB
          Peter Bacsko
        5. YARN-6715-branch-3.2.001.patch
          4 kB
          Peter Bacsko

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            pbacsko Peter Bacsko
            pbacsko Peter Bacsko
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment