Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-18191

"Restart all required" services operation failed at Metrics Collector since HDFS was not yet up

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 2.4.0
    • trunk, 2.4.1
    • ambari-metrics
    • None

    Description

      ambari-server --hash
      4017036da951a10f519a578de934308cf866ba50

      Steps

      1. Deploy HDP-2.3.6 cluster with Ambari 2.2.2.0 (AMS is configured in distributed mode)
      2. Upgrade Ambari to 2.4.0.0 and let it complete
      3. Open Ambari web UI and hit "Restart all required" under Actions menu

      Result
      The operation fails while trying to restart Metrics Collector as it tried to make a WebHDFS call while HDFS was not started:

      Traceback (most recent call last):
        File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", line 148, in <module>
          AmsCollector().execute()
        File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute
          method(env)
        File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 725, in restart
          self.start(env)
        File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", line 46, in start
          self.configure(env, action = 'start') # for security
        File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", line 41, in configure
          hbase('master', action)
        File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
          return fn(*args, **kwargs)
        File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/hbase.py", line 213, in hbase
          dfs_type=params.dfs_type
        File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
          self.env.run()
        File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
          self.run_action(resource, action)
        File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
          provider_action()
        File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 459, in action_create_on_execute
          self.action_delayed("create")
        File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 456, in action_delayed
          self.get_hdfs_resource_executor().action_delayed(action_name, self)
        File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 256, in action_delayed
          self._set_mode(self.target_status)
        File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 363, in _set_mode
          self.util.run_command(self.main_resource.resource.target, 'SETPERMISSION', method='PUT', permission=self.mode, assertable_result=False)
        File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 179, in run_command
          _, out, err = get_user_call_output(cmd, user=self.run_user, logoutput=self.logoutput, quiet=False)
        File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py", line 61, in get_user_call_output
          raise Fail(err_msg)
      resource_management.core.exceptions.Fail: Execution of 'curl -sS -L -w '%{http_code}' -X PUT --negotiate -u : 'http://vsharma-eu-mt-5.openstacklocal:50070/webhdfs/v1/user/ams/hbase?op=SETPERMISSION&user.name=hdfs&permission=775' 1>/tmp/tmp8twcZt 2>/tmp/tmpLPih9a' returned 7. curl: (7) couldn't connect to host
      401
      

      Afterwards, restarted HDFS individually first and then hit "Restart all Required" - the operation was successful
      Looks like the issue is because the order of restart is incorrect across the hosts, hence the dependent services don't come up upfront

      Attachments

        1. AMBARI-18191.patch
          6 kB
          Siddharth Wagle

        Issue Links

          Activity

            People

              swagle Siddharth Wagle
              SunithaVelpula Sunitha
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: