Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-8353

hadoop-daemon.sh and yarn-daemon.sh can be misleading on stop



    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.1
    • Fix Version/s: 2.0.0-alpha
    • Component/s: scripts
    • Labels:


      The way that stop actions is implemented is a simple SIGTERM sent to the JVM. There's a time delay between when the action is called and when the process actually exists. This can be misleading to the callers of the *-daemon.sh scripts since they expect stop action to return when process is actually stopped.

      I suggest we augment the stop action with a time-delay check for the process status and a SIGKILL once the delay has expired.

      I understand that sending SIGKILL is a measure of last resort and is generally frowned upon among init.d script writers, but the excuse we have for Hadoop is that it is engineered to be a fault tolerant system and thus there's not danger of putting system into an incontinent state by a violent SIGKILL. Of course, the time delay will be long enough to make SIGKILL event a rare condition.

      Finally, there's always an option of an exponential back-off type of solution if we decide that SIGKILL timeout is short.


        1. HADOOP-8353.patch.txt
          3 kB
          Roman Shaposhnik
        2. HADOOP-8353-2.patch.txt
          4 kB
          Roman Shaposhnik

          Issue Links



              • Assignee:
                rvs Roman Shaposhnik
                rvs Roman Shaposhnik
              • Votes:
                0 Vote for this issue
                3 Start watching this issue


                • Created: