Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-8353

hadoop-daemon.sh and yarn-daemon.sh can be misleading on stop

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.23.1
    • 2.0.0-alpha
    • scripts
    • None

    Description

      The way that stop actions is implemented is a simple SIGTERM sent to the JVM. There's a time delay between when the action is called and when the process actually exists. This can be misleading to the callers of the *-daemon.sh scripts since they expect stop action to return when process is actually stopped.

      I suggest we augment the stop action with a time-delay check for the process status and a SIGKILL once the delay has expired.

      I understand that sending SIGKILL is a measure of last resort and is generally frowned upon among init.d script writers, but the excuse we have for Hadoop is that it is engineered to be a fault tolerant system and thus there's not danger of putting system into an incontinent state by a violent SIGKILL. Of course, the time delay will be long enough to make SIGKILL event a rare condition.

      Finally, there's always an option of an exponential back-off type of solution if we decide that SIGKILL timeout is short.

      Attachments

        1. HADOOP-8353.patch.txt
          3 kB
          Roman Shaposhnik
        2. HADOOP-8353-2.patch.txt
          4 kB
          Roman Shaposhnik

        Issue Links

          Activity

            People

              rvs Roman Shaposhnik
              rvs Roman Shaposhnik
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: