Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-16914

Ambari uses too small a window for region server shutdown

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.1
    • 2.4.0
    • ambari-web, stacks
    • None

    Description

      Ambari seems to issue a formal shutdown to a Region server but quickly (30 seconds) follows it up with SIGKILL. On a full loaded HBase system with about 200 regions per region server and active transaction flow, there is no way a RS can stop in 30 seconds. This has caused many issues in production including a memstore corruption. Why not use the shutdown script that comes with HBase?

      2016-05-24 15:36:19,191 - Execute['/usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh --config /usr/hdp/current/hbase-regionserver/conf stop regionserver']

      {'only_if': 'ambari-sudo.sh -H -E test -f /var/run/hbase/hbase-hbase-regionserver.pid && ps -p `ambari-sudo.sh -H -E cat /var/run/hbase/hbase-hbase-regionserver.pid` >/dev/null 2>&1', 'on_timeout': '! ( ambari-sudo.sh -H -E test -f /var/run/hbase/hbase-hbase-regionserver.pid && ps -p `ambari-sudo.sh -H -E cat /var/run/hbase/hbase-hbase-regionserver.pid` >/dev/null 2>&1 ) || ambari-sudo.sh -H -E kill -9 `ambari-sudo.sh -H -E cat /var/run/hbase/hbase-hbase-regionserver.pid`', 'timeout': 30, 'user': 'hbase'}

      2016-05-24 15:36:50,982 - Executing '! ( ambari-sudo.sh -H -E test -f /var/run/hbase/hbase-hbase-regionserver.pid && ps -p `ambari-sudo.sh -H -E cat /var/run/hbase/hbase-hbase-regionserver.pid` >/dev/null 2>&1 ) || ambari-sudo.sh -H -E kill -9 `ambari-sudo.sh -H -E cat /var/run/hbase/hbase-hbase-regionserver.pid`'. Reason: Execution of 'ambari-sudo.sh su hbase -l -s /bin/bash -c 'export PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/sbin:/usr/sbin:/bin:/usr/bin:/var/lib/ambari-agent'"'"' ; /usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh --config /usr/hdp/current/hbase-regionserver/conf stop regionserver'' was killed due timeout after 30 seconds
      2016-05-24 15:36:51,053 - File['/var/run/hbase/hbase-hbase-regionserver.pid']

      {'action': ['delete']}

      2016-05-24 15:36:51,054 - Deleting File['/var/run/hbase/hbase-hbase-regionserver.pid'

      Attachments

        1. AMBARI-16914.patch
          10 kB
          Andrew Onischuk

        Issue Links

          Activity

            People

              aonishuk Andrew Onischuk
              svenkataraman666 Shankar Venkataraman
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: