Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
2.4.0
-
None
Description
ambari-server --hash
4017036da951a10f519a578de934308cf866ba50
Steps
- Deploy HDP-2.3.6 cluster with Ambari 2.2.2.0 (AMS is configured in distributed mode)
- Upgrade Ambari to 2.4.0.0 and let it complete
- Open Ambari web UI and hit "Restart all required" under Actions menu
Result
The operation fails while trying to restart Metrics Collector as it tried to make a WebHDFS call while HDFS was not started:
Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", line 148, in <module> AmsCollector().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute method(env) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 725, in restart self.start(env) File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", line 46, in start self.configure(env, action = 'start') # for security File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", line 41, in configure hbase('master', action) File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk return fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/hbase.py", line 213, in hbase dfs_type=params.dfs_type File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 459, in action_create_on_execute self.action_delayed("create") File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 456, in action_delayed self.get_hdfs_resource_executor().action_delayed(action_name, self) File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 256, in action_delayed self._set_mode(self.target_status) File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 363, in _set_mode self.util.run_command(self.main_resource.resource.target, 'SETPERMISSION', method='PUT', permission=self.mode, assertable_result=False) File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 179, in run_command _, out, err = get_user_call_output(cmd, user=self.run_user, logoutput=self.logoutput, quiet=False) File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py", line 61, in get_user_call_output raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of 'curl -sS -L -w '%{http_code}' -X PUT --negotiate -u : 'http://vsharma-eu-mt-5.openstacklocal:50070/webhdfs/v1/user/ams/hbase?op=SETPERMISSION&user.name=hdfs&permission=775' 1>/tmp/tmp8twcZt 2>/tmp/tmpLPih9a' returned 7. curl: (7) couldn't connect to host 401
Afterwards, restarted HDFS individually first and then hit "Restart all Required" - the operation was successful
Looks like the issue is because the order of restart is incorrect across the hosts, hence the dependent services don't come up upfront
Attachments
Attachments
Issue Links
- links to