Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-894

NodeHealthScriptRunner timeout checking is inaccurate on Windows



    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.1.0-beta, 3.0.0-alpha1
    • Fix Version/s: 2.1.0-beta
    • Component/s: nodemanager
    • Labels:
    • Target Version/s:
    • Hadoop Flags:


      In NodeHealthScriptRunner method, we will set HealthChecker status based on the Shell execution results. Some status are based on the exception thrown during the Shell script execution.

      Currently, we will catch a non-ExitCodeException from ShellCommandExecutor, and if Shell has the timeout status set at the same time, we will also set HealthChecker status to timeout.

      We have following execution sequence in Shell:
      1) In main thread, schedule a delayed timer task that will kill the original process upon timeout.
      2) In main thread, open a buffered reader and feed in the process's standard input stream.
      3) When timeout happens, the timer task will call Process#destroy()
      to kill the main process.

      On Linux, when timeout happened and process killed, the buffered reader will thrown an IOException with message: "Stream closed" in main thread.

      On Windows, we don't have the IOException. Only "-1" was returned from the reader that indicates the buffer is finished. As a result, the timeout status is not set on Windows, and TestNodeHealthService fails on Windows because of this.


        1. ReadProcessStdout.java
          1 kB
          Chuan Liu
        2. wait.cmd
          0.0 kB
          Chuan Liu
        3. wait.sh
          0.0 kB
          Chuan Liu
        4. YARN-894-trunk.patch
          3 kB
          Chuan Liu

          Issue Links



              • Assignee:
                chuanliu Chuan Liu
                chuanliu Chuan Liu
              • Votes:
                0 Vote for this issue
                3 Start watching this issue


                • Created: