Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-18260

Journal node restart failing on RU from HDP 2.4.x to 2.5 on Wire Encrypted cluster

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • trunk
    • None
    • None

    Description

      Type of upgrade : RU
      Upgrade from HDP (2.4.2.0) to 2.5 (on secure, Wire encrypted
      cluster)

      Journal node logs show :

      org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't write, no segment open
      at org.apache.hadoop.hdfs.qjournal.server.Journal.checkSync(Journal.java:484)
      at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:353)
      at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:152)
      at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)
      at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
      รง2016-08-25 07:19:47,651 INFO namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file /grid/0/hadoop/hdfs/journal/nameservice/current/edits_inprogress_0000000000000073697 -> /grid/0/hadoop/hdfs/journal/nameservice/current/edits_0000000000000073697-0000000000000073698

      Error at the exact RU task:

      Traceback (most recent call last):
      File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/journalnode.py", line 198, in <module>
      JournalNode().execute()
      File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute
      method(env)
      File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 731, in restart
      self.post_upgrade_restart(env, upgrade_type=upgrade_type)
      File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/journalnode.py", line 75, in post_upgrade_restart
      journalnode_upgrade.post_upgrade_check()
      File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/journalnode_upgrade.py", line 64, in post_upgrade_check
      namenode_ha.is_encrypted(), params.security_enabled)
      File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py", line 306, in get_jmx_data
      data = urllib2.urlopen(nn_address).read()
      File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen
      return _opener.open(url, data, timeout)
      File "/usr/lib64/python2.6/urllib2.py", line 391, in open
      response = self._open(req, data)
      File "/usr/lib64/python2.6/urllib2.py", line 409, in _open
      '_open', req)
      File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain
      result = func(*args)
      File "/usr/lib64/python2.6/urllib2.py", line 1194, in https_open
      return self.do_open(httplib.HTTPSConnection, req)
      File "/usr/lib64/python2.6/urllib2.py", line 1161, in do_open
      raise URLError(err)
      urllib2.URLError: <urlopen error [Errno 1] _ssl.c:491: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure>

      Live cluster :
      <https://172.22.99.206:8443/#/main/admin/stack/upgrade>

      Artifacts: <http://qelog.hortonworks.com/log/nats11-36-zvrs-
      dgm10toerienoha-s11/test-logs/ambariru-dgm10toerie-sec-noha/ambaritestartifact
      s/artifacts/screenshots/com.hw.ambari.ui.tests.monitoring.admin_page.TestQuick
      RollingUpgradeApi/test060_StartPerformUpgrade/_24_22_9_0_One_step_of_upgrade_f
      ailed_after_retry_group_UpgradeGroup_completedtaskCount_4__name_CORE/>

      Attachments

        1. AMBARI-18260.patch
          1 kB
          Andrew Onischuk

        Issue Links

          Activity

            People

              aonishuk Andrew Onischuk
              shreyabhatm@gmail.com Shreya Bhat
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: