Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-19204

Metrics monitor start failed after deleting AMS and reinstalling with different user

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.5.0
    • 2.5.0
    • ambari-metrics
    • None

    Description

      STR:
      1) Delete Service AMS along with Tez,HBase, Sqoop, Oozie, Falcon, Storm, Ambari Infra, Ambari Metrics, Kafka, Knox, Log Search, Smartsense, Mahout, Slider
      2) Add all the deleted services back

      Metrics collector fails to start with

      Traceback (most recent call last):
        File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_monitor.py", line 68, in <module>
          AmsMonitor().execute()
        File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 282, in execute
          method(env)
        File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_monitor.py", line 42, in start
          action = 'start'
        File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
          return fn(*args, **kwargs)
        File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/ams_service.py", line 103, in ams_service
          user=params.ams_user
        File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
          self.env.run()
        File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
          self.run_action(resource, action)
        File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
          provider_action()
        File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run
          tries=self.resource.tries, try_sleep=self.resource.try_sleep)
        File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
          result = function(command, **kwargs)
        File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
          tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
        File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
          result = _call(command, **kwargs_copy)
        File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
          raise ExecutionFailed(err_msg, code, out, err)
      resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/sbin/ambari-metrics-monitor --config /etc/ambari-metrics-monitor/conf start' returned 255. ######## Hortonworks #############
      This is MOTD message, added for testing in qe infra
      psutil build directory is not empty, continuing...
      Verifying Python version compatibility...
      Using python  /usr/bin/python2.6
      Checking for previously running Metric Monitor...
      Starting ambari-metrics-monitor
      /usr/sbin/ambari-metrics-monitor: line 148: /grid/0/log/metric_monitor/ambari-metrics-monitor.out: Permission denied
      Verifying ambari-metrics-monitor process status...
      ERROR: ambari-metrics-monitor start failed. For more details, see /grid/0/log/metric_monitor/ambari-metrics-monitor.out:
      ====================
      2016-12-14 05:37:41,956 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint : [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
      2016-12-14 05:37:41,956 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint : [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
      2016-12-14 05:37:51,956 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint : [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
      2016-12-14 05:37:51,956 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint : [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
      2016-12-14 05:38:01,957 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint : [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
      2016-12-14 05:38:01,957 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint : [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
      2016-12-14 05:38:11,958 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint : [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
      2016-12-14 05:38:11,958 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint : [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
      2016-12-14 05:38:21,959 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint : [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
      2016-12-14 05:38:21,959 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint : [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
      ====================
      Monitor out at: /grid/0/log/metric_monitor/ambari-metrics-monitor.out
      stdout:   /var/lib/ambari-agent/data/output-1028.txt
      
      2016-12-14 06:12:10,119 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
      2016-12-14 06:12:10,432 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
      2016-12-14 06:12:10,433 - Group['cstm-knox-group'] {}
      2016-12-14 06:12:10,434 - Group['hadoop'] {}
      2016-12-14 06:12:10,435 - Group['users'] {}
      2016-12-14 06:12:10,435 - User['zookeeper'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
      2016-12-14 06:12:10,436 - User['infra-solr'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
      2016-12-14 06:12:10,437 - User['cstm-sqoop'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
      2016-12-14 06:12:10,438 - User['cstm-ams'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
      2016-12-14 06:12:10,439 - User['cstm-tez'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['users']}
      2016-12-14 06:12:10,441 - User['cstm-storm'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
      2016-12-14 06:12:10,442 - User['cstm-knox'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
      2016-12-14 06:12:10,443 - User['cstm-flume'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
      2016-12-14 06:12:10,444 - User['cstm-mahout'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
      2016-12-14 06:12:10,444 - User['cstm-hbase'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
      2016-12-14 06:12:10,445 - User['logsearch'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
      2016-12-14 06:12:10,446 - User['cstm-falcon'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['users']}
      2016-12-14 06:12:10,447 - User['ambari-qa'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['users']}
      2016-12-14 06:12:10,448 - User['kafka'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
      2016-12-14 06:12:10,449 - User['hdfs'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
      2016-12-14 06:12:10,450 - User['cstm-oozie'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['users']}
      2016-12-14 06:12:10,451 - User['yarn'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
      2016-12-14 06:12:10,452 - User['mapred'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
      2016-12-14 06:12:10,453 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
      2016-12-14 06:12:10,612 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
      2016-12-14 06:12:10,626 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] due to not_if
      2016-12-14 06:12:10,627 - Directory['/tmp/hbase-hbase'] {'owner': 'cstm-hbase', 'create_parents': True, 'mode': 0775, 'cd_access': 'a'}
      2016-12-14 06:12:10,826 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
      2016-12-14 06:12:10,963 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh cstm-hbase /home/cstm-hbase,/tmp/cstm-hbase,/usr/bin/cstm-hbase,/var/log/cstm-hbase,/tmp/hbase-hbase'] {'not_if': '(test $(id -u cstm-hbase) -gt 1000) || (false)'}
      2016-12-14 06:12:10,983 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh cstm-hbase /home/cstm-hbase,/tmp/cstm-hbase,/usr/bin/cstm-hbase,/var/log/cstm-hbase,/tmp/hbase-hbase'] due to not_if
      2016-12-14 06:12:10,984 - Group['hdfs'] {}
      2016-12-14 06:12:10,984 - User['hdfs'] {'fetch_nonlocal_groups': True, 'groups': ['hadoop', 'hdfs']}
      2016-12-14 06:12:10,985 - FS Type: 
      2016-12-14 06:12:10,985 - Directory['/etc/hadoop'] {'mode': 0755}
      2016-12-14 06:12:11,068 - File['/usr/hdp/current/hadoop-client/conf/hadoop-env.sh'] {'content': InlineTemplate(...), 'owner': 'root', 'group': 'hadoop'}
      2016-12-14 06:12:11,192 - Directory['/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir'] {'owner': 'hdfs', 'group': 'hadoop', 'mode': 01777}
      2016-12-14 06:12:11,296 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if': 'test -f /selinux/enforce'}
      2016-12-14 06:12:11,317 - Skipping Execute[('setenforce', '0')] due to not_if
      2016-12-14 06:12:11,317 - Directory['/grid/0/log/hdfs'] {'owner': 'root', 'create_parents': True, 'group': 'hadoop', 'mode': 0775, 'cd_access': 'a'}
      2016-12-14 06:12:11,603 - Directory['/grid/0/pid/hdfs'] {'owner': 'root', 'create_parents': True, 'group': 'root', 'cd_access': 'a'}
      2016-12-14 06:12:11,671 - Changing owner for /grid/0/pid/hdfs from 1021 to root
      2016-12-14 06:12:11,671 - Changing group for /grid/0/pid/hdfs from 1006 to root
      2016-12-14 06:12:11,861 - Directory['/tmp/hadoop-hdfs'] {'owner': 'hdfs', 'create_parents': True, 'cd_access': 'a'}
      2016-12-14 06:12:12,019 - File['/usr/hdp/current/hadoop-client/conf/commons-logging.properties'] {'content': Template('commons-logging.properties.j2'), 'owner': 'root'}
      2016-12-14 06:12:12,143 - File['/usr/hdp/current/hadoop-client/conf/health_check'] {'content': Template('health_check.j2'), 'owner': 'root'}
      2016-12-14 06:12:12,248 - File['/usr/hdp/current/hadoop-client/conf/log4j.properties'] {'content': ..., 'owner': 'hdfs', 'group': 'hadoop', 'mode': 0644}
      2016-12-14 06:12:12,380 - File['/usr/hdp/current/hadoop-client/conf/hadoop-metrics2.properties'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'}
      2016-12-14 06:12:12,482 - File['/usr/hdp/current/hadoop-client/conf/task-log4j.properties'] {'content': StaticFile('task-log4j.properties'), 'mode': 0755}
      2016-12-14 06:12:12,597 - File['/usr/hdp/current/hadoop-client/conf/configuration.xsl'] {'owner': 'hdfs', 'group': 'hadoop'}
      2016-12-14 06:12:12,672 - File['/etc/hadoop/conf/topology_mappings.data'] {'owner': 'hdfs', 'content': Template('topology_mappings.data.j2'), 'only_if': 'test -d /etc/hadoop/conf', 'group': 'hadoop'}
      2016-12-14 06:12:12,823 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'), 'only_if': 'test -d /etc/hadoop/conf', 'mode': 0755}
      2016-12-14 06:12:13,461 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
      2016-12-14 06:12:13,466 - checked_call['hostid'] {}
      2016-12-14 06:12:13,485 - checked_call returned (0, '1bac0d12')
      2016-12-14 06:12:13,488 - Directory['/etc/ambari-metrics-monitor/conf'] {'owner': 'cstm-ams', 'group': 'hadoop', 'create_parents': True}
      2016-12-14 06:12:13,581 - Directory['/grid/0/log/metric_monitor'] {'owner': 'cstm-ams', 'group': 'hadoop', 'create_parents': True, 'mode': 0755}
      2016-12-14 06:12:13,693 - Directory['/grid/0/pid/metric_monitor'] {'owner': 'cstm-ams', 'group': 'hadoop', 'create_parents': True, 'mode': 0755, 'cd_access': 'a'}
      2016-12-14 06:12:13,971 - Directory['/usr/lib/python2.6/site-packages/resource_monitoring/psutil/build'] {'owner': 'cstm-ams', 'group': 'hadoop', 'create_parents': True, 'cd_access': 'a'}
      2016-12-14 06:12:14,387 - Execute['ambari-sudo.sh chown -R cstm-ams:hadoop /usr/lib/python2.6/site-packages/resource_monitoring'] {}
      2016-12-14 06:12:14,411 - TemplateConfig['/etc/ambari-metrics-monitor/conf/metric_monitor.ini'] {'owner': 'cstm-ams', 'template_tag': None, 'group': 'hadoop'}
      2016-12-14 06:12:14,421 - File['/etc/ambari-metrics-monitor/conf/metric_monitor.ini'] {'content': Template('metric_monitor.ini.j2'), 'owner': 'cstm-ams', 'group': 'hadoop', 'mode': None}
      2016-12-14 06:12:14,549 - TemplateConfig['/etc/ambari-metrics-monitor/conf/metric_groups.conf'] {'owner': 'cstm-ams', 'template_tag': None, 'group': 'hadoop'}
      2016-12-14 06:12:14,551 - File['/etc/ambari-metrics-monitor/conf/metric_groups.conf'] {'content': Template('metric_groups.conf.j2'), 'owner': 'cstm-ams', 'group': 'hadoop', 'mode': None}
      2016-12-14 06:12:14,672 - File['/etc/ambari-metrics-monitor/conf/ams-env.sh'] {'content': InlineTemplate(...), 'owner': 'cstm-ams'}
      2016-12-14 06:12:14,814 - Execute['/usr/sbin/ambari-metrics-monitor --config /etc/ambari-metrics-monitor/conf start'] {'user': 'cstm-ams'}
      2016-12-14 06:12:16,884 - Execute['find /grid/0/log/metric_monitor -maxdepth 1 -type f -name '*' -exec echo '==> {} <==' \; -exec tail -n 40 {} \;'] {'logoutput': True, 'ignore_failures': True, 'user': 'cstm-ams'}
      ######## Hortonworks #############
      This is MOTD message, added for testing in qe infra
      ==> /grid/0/log/metric_monitor/ambari-metrics-monitor.out <==
      2016-12-14 05:35:21,946 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint : [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
      2016-12-14 05:35:27,256 [INFO] emitter.py:152 - Calculated collector shard based on hostname : ctr-e83-1481604818073-0640-01-000006.hwx.site
      

      NOTE: During cluster initial installation, AMS was installed as user ams, but while re-adding AMS, it was added as custom user (cstm-ams)

      Attachments

        1. AMBARI-19204.patch
          0.7 kB
          Aravindan Vijayan

        Activity

          People

            avijayan Aravindan Vijayan
            vrathod Vivek Rathod
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: