Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-12852

Tests from hbase-it that use ChaosMonkey don't fail if SSH commands fail

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Incomplete
    • 0.98.6
    • None
    • integration tests
    • None

    Description

      I've just started rolling my sleeves up and playing about with hbase-it (at the moment, only on 0.98.6), but wanted to begin filing JIRAs for issues I encounter so that I don't forget to get to them. First up is the fact that it seems that tests run with ChaosMonkey don't fail when the ChaosMonkey fails to work. As an example, while running IntegrationTestIngest with a slowDeterministic CM, I forgot to set up SSH properly and saw the following:

      15/01/14 07:36:53 WARN hbase.ClusterManager: Remote command: ps aux | grep proc_regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s SIGKILL , hostname:node-5.internal failed at attempt 4. Retrying until maxAttempts: 5. Exception: stderr: Permission denied, please try again.
      Permission denied, please try again.
      Permission denied (publickey,password).
      , stdout: 
      15/01/14 07:36:53 INFO util.RetryCounter: Sleeping 16000ms before retry #4...
      15/01/14 07:36:53 INFO zookeeper.ZooKeeper: Session: 0x14ae74d7bac006b closed
      15/01/14 07:36:53 INFO policies.Policy: Sleeping for: 59541
      15/01/14 07:36:53 INFO zookeeper.ClientCnxn: EventThread shut down
      Failed to write keys: 0
      Key range: [150000..159999]
      Batch updates: false
      Percent of keys to update: 60
      Updater threads: 10
      Ignore nonce conflicts: true
      Regions per server: 5
      15/01/14 07:36:56 INFO util.LoadTestTool: Starting to mutate data...
      Starting to mutate data...
      15/01/14 07:36:57 INFO policies.Policy: Sleeping for: 88816
      15/01/14 07:37:01 INFO util.MultiThreadedAction: [U:10] Keys=471, cols=5.7 K, time=00:00:05 Overall: [keys/s= 94, latency=102 ms] Current: [keys/s=94, latency=102 ms], wroteUpTo=149999
      15/01/14 07:37:06 INFO util.MultiThreadedAction: [U:10] Keys=908, cols=11.0 K, time=00:00:10 Overall: [keys/s= 90, latency=90 ms] Current: [keys/s=87, latency=77 ms], wroteUpTo=149999
      15/01/14 07:37:09 INFO hbase.ClusterManager: Executing remote command: ps aux | grep proc_regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s SIGKILL , hostname:node-5.internal
      15/01/14 07:37:09 INFO util.Shell: Executing full command [/usr/bin/ssh  node-5.internal "ps aux | grep proc_regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s SIGKILL"]
      15/01/14 07:37:09 WARN policies.Policy: Exception occured during performing action: ExitCodeException exitCode=255: stderr: Permission denied, please try again.
      Permission denied, please try again.
      Permission denied (publickey,password).
      , stdout: 
      	at org.apache.hadoop.hbase.HBaseClusterManager.exec(HBaseClusterManager.java:208)
      	at org.apache.hadoop.hbase.HBaseClusterManager.execWithRetries(HBaseClusterManager.java:223)
      	at org.apache.hadoop.hbase.HBaseClusterManager.signal(HBaseClusterManager.java:268)
      	at org.apache.hadoop.hbase.ClusterManager.kill(ClusterManager.java:97)
      	at org.apache.hadoop.hbase.DistributedHBaseCluster.killRegionServer(DistributedHBaseCluster.java:110)
      	at org.apache.hadoop.hbase.chaos.actions.Action.killRs(Action.java:84)
      	at org.apache.hadoop.hbase.chaos.actions.RestartActionBaseAction.restartRs(RestartActionBaseAction.java:50)
      	at org.apache.hadoop.hbase.chaos.actions.RestartRsHoldingMetaAction.perform(RestartRsHoldingMetaAction.java:38)
      	at org.apache.hadoop.hbase.chaos.policies.DoActionsOncePolicy.runOneIteration(DoActionsOncePolicy.java:50)
      	at org.apache.hadoop.hbase.chaos.policies.PeriodicPolicy.run(PeriodicPolicy.java:41)
      	at org.apache.hadoop.hbase.chaos.policies.CompositeSequentialPolicy.run(CompositeSequentialPolicy.java:42)
      	at java.lang.Thread.run(Thread.java:745)
      

      Seems to me that tests should fail in these instances rather than just toss a warning. Was this just an oversight, enis and ndimiduk, or is this by design?

      Attachments

        Activity

          People

            Unassigned Unassigned
            dimaspivak Dima Spivak
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: