Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-21527

Restart of MR2 History Server failed due to wrong NameNode RPC address

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.5.2
    • 2.5.2
    • ambari-server
    • None

    Description

      P.S
      This happens on NN restart (kerberos cluster) and remote DS restart (non secured cluster) as well.

      Steps:

      • Installed BI 4.2 cluster on Ambari 2.2 with Slider and services it required
      • Upgraded Ambari to 2.5.2.0-146
      • Registered HDP 2.6.1.0 repo, installed packages
      • Restarted services that needed restart
      • Ran service checks
      • Started upgrade

      Result: Restarting History Server step failed with

      errors-87.txt
      Traceback (most recent call last):
        File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/historyserver.py", line 134, in <module>
          HistoryServer().execute()
        File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 329, in execute
          method(env)
        File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 841, in restart
          self.pre_upgrade_restart(env, upgrade_type=upgrade_type)
        File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/historyserver.py", line 85, in pre_upgrade_restart
          copy_to_hdfs("mapreduce", params.user_group, params.hdfs_user, skip=params.sysprep_skip_copy_tarballs_hdfs)
        File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/copy_tarball.py", line 267, in copy_to_hdfs
          replace_existing_files=replace_existing_files,
        File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
          self.env.run()
        File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
          self.run_action(resource, action)
        File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
          provider_action()
        File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 560, in action_create_on_execute
          self.action_delayed("create")
        File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 557, in action_delayed
          self.get_hdfs_resource_executor().action_delayed(action_name, self)
        File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 292, in action_delayed
          self._create_resource()
        File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 308, in _create_resource
          self._create_file(self.main_resource.resource.target, source=self.main_resource.resource.source, mode=self.mode)
        File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 423, in _create_file
          self.util.run_command(target, 'CREATE', method='PUT', overwrite=True, assertable_result=False, file_to_put=source, **kwargs)
        File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 204, in run_command
          raise Fail(err_msg)
      resource_management.core.exceptions.Fail: Execution of 'curl -sS -L -w '%{http_code}' -X PUT --data-binary @/usr/hdp/2.6.1.0-129/hadoop/mapreduce.tar.gz -H 'Content-Type: application/octet-stream' 'http://c7301.ambari.apache.org:50070/webhdfs/v1/hdp/apps/2.6.1.0-129/mapreduce/mapreduce.tar.gz?op=CREATE&user.name=hdfs&overwrite=True&permission=444'' returned status_code=403. 
      {
        "RemoteException": {
          "exception": "ConnectException", 
          "javaClassName": "java.net.ConnectException", 
          "message": "Call From c7301.ambari.apache.org/192.168.73.101 to c7301.ambari.apache.org:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused"
        }
      }
      
      NameNode log, pre-upgrade restart
      2017-07-18 07:48:05,435 INFO  namenode.NameNode (NameNode.java:setClientNamenodeAddress(397)) - fs.defaultFS is hdfs://c7301.ambari.apache.org:8020
      2017-07-18 07:48:05,436 INFO  namenode.NameNode (NameNode.java:setClientNamenodeAddress(417)) - Clients are to use c7301.ambari.apache.org:8020 to access this namenode/service.
      2017-07-18 07:48:07,343 INFO  namenode.NameNode (NameNodeRpcServer.java:<init>(342)) - RPC server is binding to c7301.ambari.apache.org:8020
      2017-07-18 07:48:07,434 INFO  namenode.NameNode (NameNode.java:startCommonServices(695)) - NameNode RPC up at: c7301.ambari.apache.org/192.168.73.101:8020
      
      NameNode log, in-upgrade restart
      2017-07-18 09:03:42,336 INFO  namenode.NameNode (NameNode.java:setClientNamenodeAddress(450)) - fs.defaultFS is hdfs://c7301.ambari.apache.org:8020
      2017-07-18 09:03:42,337 INFO  namenode.NameNode (NameNode.java:setClientNamenodeAddress(470)) - Clients are to use c7301.ambari.apache.org:8020 to access this namenode/service.
      2017-07-18 09:03:44,686 INFO  namenode.NameNode (NameNodeRpcServer.java:<init>(428)) - RPC server is binding to localhost:8020
      2017-07-18 09:03:44,995 INFO  namenode.NameNode (NameNode.java:startCommonServices(876)) - NameNode RPC up at: localhost/127.0.0.1:8020
      

      Looks like something during the upgrade configures NameNode RPC to listen only on localhost.

      Attachments

        1. AMBARI-21527-HA_and_NonHA.patch
          6 kB
          Di Li
        2. AMBARI-21527.patch
          3 kB
          Siddharth Wagle

        Issue Links

          Activity

            People

              dili Di Li
              swagle Siddharth Wagle
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: