Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.5.0
-
None
Description
STR:
1) Delete Service AMS along with Tez,HBase, Sqoop, Oozie, Falcon, Storm, Ambari Infra, Ambari Metrics, Kafka, Knox, Log Search, Smartsense, Mahout, Slider
2) Add all the deleted services back
Metrics collector fails to start with
Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_monitor.py", line 68, in <module> AmsMonitor().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 282, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_monitor.py", line 42, in start action = 'start' File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk return fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/ams_service.py", line 103, in ams_service user=params.ams_user File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call raise ExecutionFailed(err_msg, code, out, err) resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/sbin/ambari-metrics-monitor --config /etc/ambari-metrics-monitor/conf start' returned 255. ######## Hortonworks ############# This is MOTD message, added for testing in qe infra psutil build directory is not empty, continuing... Verifying Python version compatibility... Using python /usr/bin/python2.6 Checking for previously running Metric Monitor... Starting ambari-metrics-monitor /usr/sbin/ambari-metrics-monitor: line 148: /grid/0/log/metric_monitor/ambari-metrics-monitor.out: Permission denied Verifying ambari-metrics-monitor process status... ERROR: ambari-metrics-monitor start failed. For more details, see /grid/0/log/metric_monitor/ambari-metrics-monitor.out: ==================== 2016-12-14 05:37:41,956 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint : [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640' 2016-12-14 05:37:41,956 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint : [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007' 2016-12-14 05:37:51,956 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint : [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640' 2016-12-14 05:37:51,956 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint : [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007' 2016-12-14 05:38:01,957 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint : [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640' 2016-12-14 05:38:01,957 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint : [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007' 2016-12-14 05:38:11,958 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint : [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640' 2016-12-14 05:38:11,958 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint : [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007' 2016-12-14 05:38:21,959 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint : [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640' 2016-12-14 05:38:21,959 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint : [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007' ==================== Monitor out at: /grid/0/log/metric_monitor/ambari-metrics-monitor.out stdout: /var/lib/ambari-agent/data/output-1028.txt 2016-12-14 06:12:10,119 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf 2016-12-14 06:12:10,432 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf 2016-12-14 06:12:10,433 - Group['cstm-knox-group'] {} 2016-12-14 06:12:10,434 - Group['hadoop'] {} 2016-12-14 06:12:10,435 - Group['users'] {} 2016-12-14 06:12:10,435 - User['zookeeper'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2016-12-14 06:12:10,436 - User['infra-solr'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2016-12-14 06:12:10,437 - User['cstm-sqoop'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2016-12-14 06:12:10,438 - User['cstm-ams'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2016-12-14 06:12:10,439 - User['cstm-tez'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['users']} 2016-12-14 06:12:10,441 - User['cstm-storm'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2016-12-14 06:12:10,442 - User['cstm-knox'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2016-12-14 06:12:10,443 - User['cstm-flume'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2016-12-14 06:12:10,444 - User['cstm-mahout'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2016-12-14 06:12:10,444 - User['cstm-hbase'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2016-12-14 06:12:10,445 - User['logsearch'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2016-12-14 06:12:10,446 - User['cstm-falcon'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['users']} 2016-12-14 06:12:10,447 - User['ambari-qa'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['users']} 2016-12-14 06:12:10,448 - User['kafka'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2016-12-14 06:12:10,449 - User['hdfs'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2016-12-14 06:12:10,450 - User['cstm-oozie'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['users']} 2016-12-14 06:12:10,451 - User['yarn'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2016-12-14 06:12:10,452 - User['mapred'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2016-12-14 06:12:10,453 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555} 2016-12-14 06:12:10,612 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'} 2016-12-14 06:12:10,626 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] due to not_if 2016-12-14 06:12:10,627 - Directory['/tmp/hbase-hbase'] {'owner': 'cstm-hbase', 'create_parents': True, 'mode': 0775, 'cd_access': 'a'} 2016-12-14 06:12:10,826 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555} 2016-12-14 06:12:10,963 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh cstm-hbase /home/cstm-hbase,/tmp/cstm-hbase,/usr/bin/cstm-hbase,/var/log/cstm-hbase,/tmp/hbase-hbase'] {'not_if': '(test $(id -u cstm-hbase) -gt 1000) || (false)'} 2016-12-14 06:12:10,983 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh cstm-hbase /home/cstm-hbase,/tmp/cstm-hbase,/usr/bin/cstm-hbase,/var/log/cstm-hbase,/tmp/hbase-hbase'] due to not_if 2016-12-14 06:12:10,984 - Group['hdfs'] {} 2016-12-14 06:12:10,984 - User['hdfs'] {'fetch_nonlocal_groups': True, 'groups': ['hadoop', 'hdfs']} 2016-12-14 06:12:10,985 - FS Type: 2016-12-14 06:12:10,985 - Directory['/etc/hadoop'] {'mode': 0755} 2016-12-14 06:12:11,068 - File['/usr/hdp/current/hadoop-client/conf/hadoop-env.sh'] {'content': InlineTemplate(...), 'owner': 'root', 'group': 'hadoop'} 2016-12-14 06:12:11,192 - Directory['/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir'] {'owner': 'hdfs', 'group': 'hadoop', 'mode': 01777} 2016-12-14 06:12:11,296 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if': 'test -f /selinux/enforce'} 2016-12-14 06:12:11,317 - Skipping Execute[('setenforce', '0')] due to not_if 2016-12-14 06:12:11,317 - Directory['/grid/0/log/hdfs'] {'owner': 'root', 'create_parents': True, 'group': 'hadoop', 'mode': 0775, 'cd_access': 'a'} 2016-12-14 06:12:11,603 - Directory['/grid/0/pid/hdfs'] {'owner': 'root', 'create_parents': True, 'group': 'root', 'cd_access': 'a'} 2016-12-14 06:12:11,671 - Changing owner for /grid/0/pid/hdfs from 1021 to root 2016-12-14 06:12:11,671 - Changing group for /grid/0/pid/hdfs from 1006 to root 2016-12-14 06:12:11,861 - Directory['/tmp/hadoop-hdfs'] {'owner': 'hdfs', 'create_parents': True, 'cd_access': 'a'} 2016-12-14 06:12:12,019 - File['/usr/hdp/current/hadoop-client/conf/commons-logging.properties'] {'content': Template('commons-logging.properties.j2'), 'owner': 'root'} 2016-12-14 06:12:12,143 - File['/usr/hdp/current/hadoop-client/conf/health_check'] {'content': Template('health_check.j2'), 'owner': 'root'} 2016-12-14 06:12:12,248 - File['/usr/hdp/current/hadoop-client/conf/log4j.properties'] {'content': ..., 'owner': 'hdfs', 'group': 'hadoop', 'mode': 0644} 2016-12-14 06:12:12,380 - File['/usr/hdp/current/hadoop-client/conf/hadoop-metrics2.properties'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'} 2016-12-14 06:12:12,482 - File['/usr/hdp/current/hadoop-client/conf/task-log4j.properties'] {'content': StaticFile('task-log4j.properties'), 'mode': 0755} 2016-12-14 06:12:12,597 - File['/usr/hdp/current/hadoop-client/conf/configuration.xsl'] {'owner': 'hdfs', 'group': 'hadoop'} 2016-12-14 06:12:12,672 - File['/etc/hadoop/conf/topology_mappings.data'] {'owner': 'hdfs', 'content': Template('topology_mappings.data.j2'), 'only_if': 'test -d /etc/hadoop/conf', 'group': 'hadoop'} 2016-12-14 06:12:12,823 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'), 'only_if': 'test -d /etc/hadoop/conf', 'mode': 0755} 2016-12-14 06:12:13,461 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf 2016-12-14 06:12:13,466 - checked_call['hostid'] {} 2016-12-14 06:12:13,485 - checked_call returned (0, '1bac0d12') 2016-12-14 06:12:13,488 - Directory['/etc/ambari-metrics-monitor/conf'] {'owner': 'cstm-ams', 'group': 'hadoop', 'create_parents': True} 2016-12-14 06:12:13,581 - Directory['/grid/0/log/metric_monitor'] {'owner': 'cstm-ams', 'group': 'hadoop', 'create_parents': True, 'mode': 0755} 2016-12-14 06:12:13,693 - Directory['/grid/0/pid/metric_monitor'] {'owner': 'cstm-ams', 'group': 'hadoop', 'create_parents': True, 'mode': 0755, 'cd_access': 'a'} 2016-12-14 06:12:13,971 - Directory['/usr/lib/python2.6/site-packages/resource_monitoring/psutil/build'] {'owner': 'cstm-ams', 'group': 'hadoop', 'create_parents': True, 'cd_access': 'a'} 2016-12-14 06:12:14,387 - Execute['ambari-sudo.sh chown -R cstm-ams:hadoop /usr/lib/python2.6/site-packages/resource_monitoring'] {} 2016-12-14 06:12:14,411 - TemplateConfig['/etc/ambari-metrics-monitor/conf/metric_monitor.ini'] {'owner': 'cstm-ams', 'template_tag': None, 'group': 'hadoop'} 2016-12-14 06:12:14,421 - File['/etc/ambari-metrics-monitor/conf/metric_monitor.ini'] {'content': Template('metric_monitor.ini.j2'), 'owner': 'cstm-ams', 'group': 'hadoop', 'mode': None} 2016-12-14 06:12:14,549 - TemplateConfig['/etc/ambari-metrics-monitor/conf/metric_groups.conf'] {'owner': 'cstm-ams', 'template_tag': None, 'group': 'hadoop'} 2016-12-14 06:12:14,551 - File['/etc/ambari-metrics-monitor/conf/metric_groups.conf'] {'content': Template('metric_groups.conf.j2'), 'owner': 'cstm-ams', 'group': 'hadoop', 'mode': None} 2016-12-14 06:12:14,672 - File['/etc/ambari-metrics-monitor/conf/ams-env.sh'] {'content': InlineTemplate(...), 'owner': 'cstm-ams'} 2016-12-14 06:12:14,814 - Execute['/usr/sbin/ambari-metrics-monitor --config /etc/ambari-metrics-monitor/conf start'] {'user': 'cstm-ams'} 2016-12-14 06:12:16,884 - Execute['find /grid/0/log/metric_monitor -maxdepth 1 -type f -name '*' -exec echo '==> {} <==' \; -exec tail -n 40 {} \;'] {'logoutput': True, 'ignore_failures': True, 'user': 'cstm-ams'} ######## Hortonworks ############# This is MOTD message, added for testing in qe infra ==> /grid/0/log/metric_monitor/ambari-metrics-monitor.out <== 2016-12-14 05:35:21,946 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint : [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007' 2016-12-14 05:35:27,256 [INFO] emitter.py:152 - Calculated collector shard based on hostname : ctr-e83-1481604818073-0640-01-000006.hwx.site
NOTE: During cluster initial installation, AMS was installed as user ams, but while re-adding AMS, it was added as custom user (cstm-ams)