Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-9126

FormatZK and ZKFC startup can fail due to zkclient connection establishment delay

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-alpha
    • Fix Version/s: None
    • Component/s: auto-failover
    • Labels:
      None

      Description

      Format and ZKFC startup flows continue further after creation of zkclient connection without waiting to check whether the connection is completely established. This leads to failure at the subsequent point if connection was not complete by then.

      Exception trace for format

      12/05/30 19:48:24 INFO zookeeper.ClientCnxn: Socket connection established to HOST-xx-xx-xx-55/xx.xx.xx.55:2182, initiating session
      12/05/30 19:48:24 INFO zookeeper.ClientCnxn: Session establishment complete on server HOST-xx-xx-xx-55/xx.xx.xx.55:2182, sessionid = 0x1379da4660c0014, negotiated timeout = 5000
      12/05/30 19:48:24 WARN ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x1379da4660c0014
      12/05/30 19:48:24 INFO zookeeper.ZooKeeper: Session: 0x1379da4660c0014 closed
      12/05/30 19:48:24 INFO zookeeper.ClientCnxn: EventThread shut down
      Exception in thread "main" java.io.IOException: Couldn't determine existence of znode '/hadoop-ha/hacluster'
              at org.apache.hadoop.ha.ActiveStandbyElector.parentZNodeExists(ActiveStandbyElector.java:263)
              at org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:257)
              at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:195)
              at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:58)
              at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:163)
              at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:159)
              at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438)
              at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:159)
              at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:171)
      Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hadoop-ha/hacluster
              at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
              at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
              at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
              at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1049)
              at org.apache.hadoop.ha.ActiveStandbyElector.parentZNodeExists(ActiveStandbyElector.java:261)
              ... 8 more
      
      

        Attachments

        1. HDFS-3477.patch
          4 kB
          Rakesh Radhakrishnan
        2. HDFS-3477.1.patch
          11 kB
          Rakesh Radhakrishnan
        3. HDFS-3477.2.patch
          7 kB
          Rakesh Radhakrishnan
        4. HDFS-3477.3.patch
          9 kB
          Rakesh Radhakrishnan
        5. HDFS-3477.3.patch
          9 kB
          Rakesh Radhakrishnan
        6. hdfs-3477.txt
          11 kB
          Todd Lipcon
        7. hdfs-3477.txt
          11 kB
          Todd Lipcon

          Issue Links

            Activity

              People

              • Assignee:
                rakeshr Rakesh Radhakrishnan
                Reporter:
                suja suja s
              • Votes:
                0 Vote for this issue
                Watchers:
                14 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: