Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-15190

Monkey dies when running on shared cluster (gives up when can't kill the other fellows processes)

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      Trying to run IT on a cluster shared with an other, the monkeys give up because they error out trying to kill the other fellows daemons:

      Failure looks like this:

      16/01/29 09:07:09 WARN hbase.HBaseClusterManager: Remote command: ps aux | grep proc_regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s SIGKILL , hostname:ve0536.halxg.cloudera.com failed at attempt 3. Retrying until maxAttempts: 5. Exception: stderr: kill 115040: Operation not permitted
      , stdout:
      

      The operation is not permitted because there is a regionserver running that is owned by someone else. We retry and then give up on the monkey.

      Fix seems simple.

      Attachments

        1. shared.patch
          1 kB
          Michael Stack
        2. 15190.patch
          1 kB
          Michael Stack
        3. 15190.patch
          1 kB
          Michael Stack

        Activity

          People

            stack Michael Stack
            stack Michael Stack
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: