Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8409

ActiveStandbyElectorBasedElectorService is failing with NPE

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.1
    • 3.2.0, 3.1.1
    • None
    • None

    Description

      In RM-HA env, kill ZK leader and then perform RM failover. 

      Sometimes, active RM gets NPE and fail to come up successfully

      
      2018-06-08 10:31:03,007 INFO  client.ZooKeeperSaslClient (ZooKeeperSaslClient.java:run(289)) - Client will use GSSAPI as SASL mechanism.
      
      2018-06-08 10:31:03,008 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server xxx/xxx:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
      
      2018-06-08 10:31:03,009 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1146)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
      
      java.net.ConnectException: Connection refused
      
      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
      
      at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
      
      at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
      
      at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
      
      2018-06-08 10:31:03,344 INFO  service.AbstractService (AbstractService.java:noteFailure(267)) - Service org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService failed in state INITED
      
      java.lang.NullPointerException
      
      at org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1033)
      
      at org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1030)
      
      at org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095)
      
      at org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1087)
      
      at org.apache.hadoop.ha.ActiveStandbyElector.createWithRetries(ActiveStandbyElector.java:1030)
      
      at org.apache.hadoop.ha.ActiveStandbyElector.ensureParentZNode(ActiveStandbyElector.java:347)
      
      at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.serviceInit(ActiveStandbyElectorBasedElectorService.java:110)
      
      at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
      
      at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
      
      at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:336)
      
      at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
      
      at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1479)
      
      2018-06-08 10:31:03,345 INFO  ha.ActiveStandbyElector (ActiveStandbyElector.java:quitElection(409)) - Yielding from election

      Attachments

        1. YARN-8409.002.patch
          4 kB
          Chandni Singh

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            csingh Chandni Singh
            yeshavora Yesha Vora
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment