Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-10317

Knox gateway fails to restart on Ubuntu 12.04 after system restart because /var/run/knox is deleted

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0, 2.1.0
    • 2.1.0
    • ambari-server
    • None
    • ubuntu 12.04

    Description

      We are testing deploying an HDP 2.2. Cluster using ambari 2.0.0-rc2 running on ubuntu 12.04. I’ve been able to set up a cluster running HDFS, MapReduce2, YARN, Zookeeper, Knox, Ranger, and Ambari Metrics. When I shut down the whole cluster using Actions -> Stop All in Ambari, reboot the hosts, and then try to restart the cluster I see the error below restarting the Knox gateway. The directory /var/run/knox is indeed missing on the master host.

      Knox Gateway startup log:

      2015-04-01 16:17:12,075 - Error while executing command 'start':
      Traceback (most recent call last):
      File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 214, in execute
      method(env)
      File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
      return fn(*args, **kwargs)
      File "/var/lib/ambari-agent/cache/common-services/KNOX/0.5.0.2.2/package/scripts/knox_gateway.py", line 80, in start
      self.configure(env)
      File "/var/lib/ambari-agent/cache/common-services/KNOX/0.5.0.2.2/package/scripts/knox_gateway.py", line 64, in configure
      knox()
      File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
      return fn(*args, **kwargs)
      File "/var/lib/ambari-agent/cache/common-services/KNOX/0.5.0.2.2/package/scripts/knox.py", line 99, in knox
      sudo = True,
      File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 148, in _init_
      self.env.run()
      File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run
      self.run_action(resource, action)
      File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action
      provider_action()
      File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 274, in action_run
      raise ex
      Fail: Execution of 'chown -R knox:knox /var/lib/knox/data /var/log/knox /var/log/knox /var/run/knox /etc/knox/conf' returned 1. chown: cannot access `/var/run/knox': No such file or directory

      stdout: /var/lib/ambari-agent/data/output-107.txt

      2015-04-01 16:17:06,744 - u"Group['hadoop']"

      {'ignore_failures': False}

      2015-04-01 16:17:06,744 - Modifying group hadoop
      2015-04-01 16:17:06,797 - u"Group['users']"

      {'ignore_failures': False}

      2015-04-01 16:17:06,797 - Modifying group users
      2015-04-01 16:17:06,839 - u"Group['knox']"

      {'ignore_failures': False}

      2015-04-01 16:17:06,839 - Modifying group knox
      2015-04-01 16:17:06,886 - u"Group['ranger']"

      {'ignore_failures': False}

      2015-04-01 16:17:06,886 - Modifying group ranger
      2015-04-01 16:17:06,930 - u"User['mapred']"

      {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}

      2015-04-01 16:17:06,930 - Modifying user mapred
      2015-04-01 16:17:06,976 - u"User['root']"

      {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}

      2015-04-01 16:17:06,977 - Modifying user root
      2015-04-01 16:17:07,019 - u"User['ambari-qa']"

      {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'users']}

      2015-04-01 16:17:07,020 - Modifying user ambari-qa
      2015-04-01 16:17:07,066 - u"User['zookeeper']"

      {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}

      2015-04-01 16:17:07,066 - Modifying user zookeeper
      2015-04-01 16:17:07,109 - u"User['rangerlogger']"

      {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}

      2015-04-01 16:17:07,110 - Modifying user rangerlogger
      2015-04-01 16:17:07,152 - u"User['hdfs']"

      {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}

      2015-04-01 16:17:07,152 - Modifying user hdfs
      2015-04-01 16:17:07,195 - u"User['knox']"

      {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}

      2015-04-01 16:17:07,195 - Modifying user knox
      2015-04-01 16:17:07,238 - u"User['ranger']"

      {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}

      2015-04-01 16:17:07,238 - Modifying user ranger
      2015-04-01 16:17:07,282 - u"User['yarn']"

      {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}

      2015-04-01 16:17:07,283 - Modifying user yarn
      2015-04-01 16:17:07,326 - u"User['ams']"

      {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}

      2015-04-01 16:17:07,327 - Modifying user ams
      2015-04-01 16:17:07,370 - u"User['rangeradmin']"

      {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}

      2015-04-01 16:17:07,370 - Modifying user rangeradmin
      2015-04-01 16:17:07,413 - u"File['/var/lib/ambari-agent/data/tmp/changeUid.sh']"

      {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}

      2015-04-01 16:17:07,686 - u"Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa']"

      {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}

      2015-04-01 16:17:07,728 - Skipping u"Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa']" due to not_if
      2015-04-01 16:17:07,728 - u"Group['hdfs']"

      {'ignore_failures': False}

      2015-04-01 16:17:07,728 - Modifying group hdfs
      2015-04-01 16:17:07,774 - u"User['hdfs']"

      {'ignore_failures': False, 'groups': [u'hadoop', 'hadoop', 'hdfs', u'hdfs']}

      2015-04-01 16:17:07,775 - Modifying user hdfs
      2015-04-01 16:17:07,818 - u"Directory['/etc/hadoop']"

      {'mode': 0755}

      2015-04-01 16:17:07,974 - u"Directory['/etc/hadoop/conf.empty']"

      {'owner': 'root', 'group': 'hadoop', 'recursive': True}

      2015-04-01 16:17:08,110 - u"Link['/etc/hadoop/conf']"

      {'not_if': 'ls /etc/hadoop/conf', 'to': '/etc/hadoop/conf.empty'}

      2015-04-01 16:17:08,153 - Skipping u"Link['/etc/hadoop/conf']" due to not_if
      2015-04-01 16:17:08,160 - u"File['/etc/hadoop/conf/hadoop-env.sh']"

      {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'}

      2015-04-01 16:17:08,396 - u"Execute['('setenforce', '0')']"

      {'sudo': True, 'only_if': 'test -f /selinux/enforce'}

      2015-04-01 16:17:08,448 - Skipping u"Execute['('setenforce', '0')']" due to only_if
      2015-04-01 16:17:08,448 - u"Directory['/var/log/hadoop']"

      {'owner': 'root', 'mode': 0775, 'group': 'hadoop', 'recursive': True, 'cd_access': 'a'}

      2015-04-01 16:17:08,843 - u"Directory['/var/run/hadoop']"

      {'owner': 'root', 'group': 'root', 'recursive': True, 'cd_access': 'a'}

      2015-04-01 16:17:08,886 - Creating directory u"Directory['/var/run/hadoop']"
      2015-04-01 16:17:09,066 - Changing group for /var/run/hadoop from 1000 to root
      2015-04-01 16:17:09,364 - u"Directory['/tmp/hadoop-hdfs']"

      {'owner': 'hdfs', 'recursive': True, 'cd_access': 'a'}

      2015-04-01 16:17:09,407 - Creating directory u"Directory['/tmp/hadoop-hdfs']"
      2015-04-01 16:17:09,587 - Changing owner for /tmp/hadoop-hdfs from 0 to hdfs
      2015-04-01 16:17:09,820 - u"File['/etc/hadoop/conf/commons-logging.properties']"

      {'content': Template('commons-logging.properties.j2'), 'owner': 'hdfs'}

      2015-04-01 16:17:10,049 - u"File['/etc/hadoop/conf/health_check']"

      {'content': Template('health_check-v2.j2'), 'owner': 'hdfs'}

      2015-04-01 16:17:10,272 - u"File['/etc/hadoop/conf/log4j.properties']"

      {'content': '...', 'owner': 'hdfs', 'group': 'hadoop', 'mode': 0644}

      2015-04-01 16:17:10,506 - u"File['/etc/hadoop/conf/hadoop-metrics2.properties']"

      {'content': Template('hadoop-metrics2.properties.j2'), 'owner': 'hdfs'}

      2015-04-01 16:17:10,732 - u"File['/etc/hadoop/conf/task-log4j.properties']"

      {'content': StaticFile('task-log4j.properties'), 'mode': 0755}

      2015-04-01 16:17:11,085 - u"Directory['/etc/knox/conf']"

      {'owner': 'knox', 'group': 'knox', 'recursive': True}

      2015-04-01 16:17:11,231 - u"XmlConfig['gateway-site.xml']" {'owner': 'knox', 'group': 'knox', 'conf_dir': '/etc/knox/conf', 'configuration_attributes': {}, 'configurations': ...}
      2015-04-01 16:17:11,239 - Generating config: /etc/knox/conf/gateway-site.xml
      2015-04-01 16:17:11,239 - u"File['/etc/knox/conf/gateway-site.xml']"

      {'owner': 'knox', 'content': InlineTemplate(...), 'group': 'knox', 'mode': None, 'encoding': 'UTF-8'}

      2015-04-01 16:17:11,422 - Writing u"File['/etc/knox/conf/gateway-site.xml']" because contents don't match
      2015-04-01 16:17:11,561 - u"File['/etc/knox/conf/gateway-log4j.properties']"

      {'content': '...', 'owner': 'knox', 'group': 'knox', 'mode': 0644}

      2015-04-01 16:17:11,790 - u"File['/etc/knox/conf/topologies/default.xml']"

      {'content': InlineTemplate(...), 'owner': 'knox', 'group': 'knox'}

      2015-04-01 16:17:12,014 - u"Execute['('chown', '-R', u'knox:knox', '/var/lib/knox/data', '/var/log/knox', '/var/log/knox', u'/var/run/knox', '/etc/knox/conf')']"

      {'sudo': True}

      2015-04-01 16:17:12,075 - Error while executing command 'start':
      Traceback (most recent call last):
      File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 214, in execute
      method(env)
      File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
      return fn(*args, **kwargs)
      File "/var/lib/ambari-agent/cache/common-services/KNOX/0.5.0.2.2/package/scripts/knox_gateway.py", line 80, in start
      self.configure(env)
      File "/var/lib/ambari-agent/cache/common-services/KNOX/0.5.0.2.2/package/scripts/knox_gateway.py", line 64, in configure
      knox()
      File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
      return fn(*args, **kwargs)
      File "/var/lib/ambari-agent/cache/common-services/KNOX/0.5.0.2.2/package/scripts/knox.py", line 99, in knox
      sudo = True,
      File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 148, in _init_
      self.env.run()
      File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run
      self.run_action(resource, action)
      File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action
      provider_action()
      File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 274, in action_run
      raise ex
      Fail: Execution of 'chown -R knox:knox /var/lib/knox/data /var/log/knox /var/log/knox /var/run/knox /etc/knox/conf' returned 1. chown: cannot access `/var/run/knox': No such file or directory
      2015-04-01 16:17:12,119 - Command: /usr/bin/hdp-select status knox-server > /tmp/tmp7GgVe1
      Output: knox-server - 2.2.0.0-2041

      Attachments

        1. AMBARI-10317.v2.patch
          3 kB
          Alejandro Fernandez
        2. AMBARI-10317.patch
          1 kB
          David McWhorter
        3. AMBARI-10317_branch-2.0.0.patch
          1 kB
          David McWhorter

        Issue Links

          Activity

            People

              afernandez Alejandro Fernandez
              dmcwhorter David McWhorter
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 0.5h
                  0.5h
                  Remaining:
                  Remaining Estimate - 0.5h
                  0.5h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified