Hadoop Common
  1. Hadoop Common
  2. HADOOP-8353

hadoop-daemon.sh and yarn-daemon.sh can be misleading on stop

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.1
    • Fix Version/s: 2.0.0-alpha
    • Component/s: scripts
    • Labels:
      None

      Description

      The way that stop actions is implemented is a simple SIGTERM sent to the JVM. There's a time delay between when the action is called and when the process actually exists. This can be misleading to the callers of the *-daemon.sh scripts since they expect stop action to return when process is actually stopped.

      I suggest we augment the stop action with a time-delay check for the process status and a SIGKILL once the delay has expired.

      I understand that sending SIGKILL is a measure of last resort and is generally frowned upon among init.d script writers, but the excuse we have for Hadoop is that it is engineered to be a fault tolerant system and thus there's not danger of putting system into an incontinent state by a violent SIGKILL. Of course, the time delay will be long enough to make SIGKILL event a rare condition.

      Finally, there's always an option of an exponential back-off type of solution if we decide that SIGKILL timeout is short.

      1. HADOOP-8353.patch.txt
        3 kB
        Roman Shaposhnik
      2. HADOOP-8353-2.patch.txt
        4 kB
        Roman Shaposhnik

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Roman Shaposhnik
              Reporter:
              Roman Shaposhnik
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development