Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9923

Introduce HealthReporter interface to support multiple health checker files

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.2.1
    • Fix Version/s: 3.3.0
    • Component/s: nodemanager, yarn
    • Labels:
      None
    • Target Version/s:

      Description

      Currently if a NodeManager is enabled to allocate Docker containers, but the specified binary (docker.binary in the container-executor.cfg) is missing the container allocation fails with the following error message:

      Container launch fails
      Exit code: 29
      Exception message: Launch container failed
      Shell error output: sh: <docker binary path, /usr/bin/docker by default>: No such file or directory
      Could not inspect docker network to get type /usr/bin/docker network inspect host --format='{{.Driver}}'.
      Error constructing docker command, docker error code=-1, error message='Unknown error'
      

      I suggest to add a property say "yarn.nodemanager.runtime.linux.docker.check" to have the following options:

      • STARTUP: setting this option the NodeManager would not start if Docker binaries are missing or the Docker daemon is not running (the exception is considered FATAL during startup)
      • RUNTIME: would give a more detailed/user-friendly exception in NodeManager's side (NM logs) if Docker binaries are missing or the daemon is not working. This would also prevent further Docker container allocation as long as the binaries do not exist and the docker daemon is not running.
      • NONE (default): preserving the current behaviour, throwing exception during container allocation, carrying on using the default retry procedure.

      ------------------------------------------------------------------------------------------------

      A new interface called HealthChecker is introduced which is used in the NodeHealthCheckerService. Currently existing implementations like LocalDirsHandlerService are modified to implement this giving a clear abstraction to the node's health. The DockerHealthChecker implements this new interface.

        Attachments

        1. YARN-9923.001.patch
          93 kB
          Adam Antal
        2. YARN-9923.002.patch
          103 kB
          Adam Antal
        3. YARN-9923.003.patch
          109 kB
          Adam Antal
        4. YARN-9923.004.patch
          113 kB
          Adam Antal
        5. YARN-9923.005.patch
          108 kB
          Adam Antal
        6. YARN-9923.006.patch
          117 kB
          Adam Antal
        7. YARN-9923.007.patch
          116 kB
          Adam Antal
        8. YARN-9923.008.patch
          116 kB
          Adam Antal
        9. YARN-9923.009.patch
          118 kB
          Adam Antal
        10. YARN-9923.010.patch
          118 kB
          Adam Antal

          Activity

            People

            • Assignee:
              adam.antal Adam Antal
              Reporter:
              adam.antal Adam Antal
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: