Uploaded image for project: 'Metron'
  1. Metron
  2. METRON-1326

Metron deploy with Kerberos fails on Ambari 2.5 during ES service stop

    Details

    • Type: Bug
    • Status: Done
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: 0.5.0
    • Labels:
      None
    • Environment:
      12 node VM cluster running CentOS 7

      Description

      I am noticing that Metron deploy is failing when enabling Kerberos on a 12-node VM cluster managed by Ambari 2.5.2.

      The error is seen during the "Stop Services" step while kerberizing for Elasticsearch Master and Elasticsearch Data Node services.

      I confirmed that the same deployment goes through fine for Ambari 2.4.2 version. I am able to setup the Kerberized cluster fine.

      For Ambari 2.4, for the "Elasticsearch Data Node Stop" step, we stop the slave, and do not check on the status of the service after the 'service stop' command was issued. But with Ambari 2.5, we attempt to check the status after the service stop command was issued.

      In Ambari 2.4

       stdout:
      Stop the Slave
      2017-11-07 10:21:27,755 - Execute['service elasticsearch stop'] {}
      
      Command completed successfully!
      

      In Ambari 2.5

      Stop the Slave
      2017-11-07 10:12:48,481 - Execute['service elasticsearch stop'] {}
      2017-11-07 10:12:48,599 - Waiting for actual component stop
      Status of the Slave
      2017-11-07 10:12:48,600 - Execute['service elasticsearch status'] {}
      
      Command failed after 1 tries
      

      Apparently the status command is returning a result with error code 3, which the ambari agent is not liking and hence calling the step as a failure.

      I am not sure entirely if this is something to be handled by Metron or by Ambari. Please feel free to close this defect in case this is deemed out of scope of Metron.

      Here is the full error log from the UI

      stderr:
      Traceback (most recent call last):
        File "/var/lib/ambari-agent/cache/common-services/ELASTICSEARCH/2.3.3/package/scripts/elastic_slave.py", line 71, in <module>
          Elasticsearch().execute()
        File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 332, in execute
          self.execute_prefix_function(self.command_name, 'after', env)
        File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 350, in execute_prefix_function
          method(env)
        File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 398, in after_stop
          status_method(env)
        File "/var/lib/ambari-agent/cache/common-services/ELASTICSEARCH/2.3.3/package/scripts/elastic_slave.py", line 59, in status
          Execute(status_cmd)
        File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__
          self.env.run()
        File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
          self.run_action(resource, action)
        File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
          provider_action()
        File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run
          tries=self.resource.tries, try_sleep=self.resource.try_sleep)
        File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
          result = function(command, **kwargs)
        File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
          tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
        File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
          result = _call(command, **kwargs_copy)
        File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
          raise ExecutionFailed(err_msg, code, out, err)
      resource_management.core.exceptions.ExecutionFailed: Execution of 'service elasticsearch status' returned 3. ‚óŹ elasticsearch.service - Elasticsearch
         Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; disabled; vendor preset: disabled)
         Active: inactive (dead)
           Docs: http://www.elastic.co
      
      Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07 10:12:47,340][INFO ][cluster.service          ] [metron-12.openstacklocal] removed {{metron-9.openstacklocal}{lTJDzEA6Sp6_6ryTY8XSJQ}{172.22.97.19}{172.22.97.19:9300}{master=false},}, reason: zen-disco-node_left({metron-9.openstacklocal}{lTJDzEA6Sp6_6ryTY8XSJQ}{172.22.97.19}{172.22.97.19:9300}{master=false})
      Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07 10:12:47,466][INFO ][cluster.service          ] [metron-12.openstacklocal] removed {{metron-8.openstacklocal}{Q7pgb5LLSj-oHMxld-DFfw}{172.22.97.188}{172.22.97.188:9300}{master=false},}, reason: zen-disco-node_left({metron-8.openstacklocal}{Q7pgb5LLSj-oHMxld-DFfw}{172.22.97.188}{172.22.97.188:9300}{master=false})
      Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07 10:12:47,548][INFO ][cluster.service          ] [metron-12.openstacklocal] removed {{metron-2.openstacklocal}{8JdEI93MQPeDxD63tMKrRQ}{172.22.96.83}{172.22.96.83:9300}{master=false},}, reason: zen-disco-node_left({metron-2.openstacklocal}{8JdEI93MQPeDxD63tMKrRQ}{172.22.96.83}{172.22.96.83:9300}{master=false})
      Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07 10:12:47,713][INFO ][cluster.service          ] [metron-12.openstacklocal] removed {{metron-5.openstacklocal}{643SMG8xSLOuFEZpuMNeQg}{172.22.97.119}{172.22.97.119:9300}{master=false},}, reason: zen-disco-node_left({metron-5.openstacklocal}{643SMG8xSLOuFEZpuMNeQg}{172.22.97.119}{172.22.97.119:9300}{master=false})
      Nov 07 10:12:48 metron-12 systemd[1]: Stopping Elasticsearch...
      Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07 10:12:48,417][INFO ][node                     ] [metron-12.openstacklocal] stopping ...
      Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07 10:12:48,456][INFO ][node                     ] [metron-12.openstacklocal] stopped
      Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07 10:12:48,456][INFO ][node                     ] [metron-12.openstacklocal] closing ...
      Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07 10:12:48,491][INFO ][node                     ] [metron-12.openstacklocal] closed
      Nov 07 10:12:48 metron-12 systemd[1]: Stopped Elasticsearch.
       stdout:
      Stop the Slave
      2017-11-07 10:12:49,025 - Execute['service elasticsearch stop'] {}
      2017-11-07 10:12:49,089 - Waiting for actual component stop
      Status of the Slave
      2017-11-07 10:12:49,090 - Execute['service elasticsearch status'] {}
      
      Command failed after 1 tries
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                msmiklavcic Michael Miklavcic
                Reporter:
                anandsubbu Anand Subramanian
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: