Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-9126

FormatZK and ZKFC startup can fail due to zkclient connection establishment delay

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0-alpha
    • None
    • auto-failover
    • None

    Description

      Format and ZKFC startup flows continue further after creation of zkclient connection without waiting to check whether the connection is completely established. This leads to failure at the subsequent point if connection was not complete by then.

      Exception trace for format

      12/05/30 19:48:24 INFO zookeeper.ClientCnxn: Socket connection established to HOST-xx-xx-xx-55/xx.xx.xx.55:2182, initiating session
      12/05/30 19:48:24 INFO zookeeper.ClientCnxn: Session establishment complete on server HOST-xx-xx-xx-55/xx.xx.xx.55:2182, sessionid = 0x1379da4660c0014, negotiated timeout = 5000
      12/05/30 19:48:24 WARN ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x1379da4660c0014
      12/05/30 19:48:24 INFO zookeeper.ZooKeeper: Session: 0x1379da4660c0014 closed
      12/05/30 19:48:24 INFO zookeeper.ClientCnxn: EventThread shut down
      Exception in thread "main" java.io.IOException: Couldn't determine existence of znode '/hadoop-ha/hacluster'
              at org.apache.hadoop.ha.ActiveStandbyElector.parentZNodeExists(ActiveStandbyElector.java:263)
              at org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:257)
              at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:195)
              at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:58)
              at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:163)
              at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:159)
              at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438)
              at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:159)
              at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:171)
      Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hadoop-ha/hacluster
              at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
              at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
              at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
              at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1049)
              at org.apache.hadoop.ha.ActiveStandbyElector.parentZNodeExists(ActiveStandbyElector.java:261)
              ... 8 more
      
      

      Attachments

        1. hdfs-3477.txt
          11 kB
          Todd Lipcon
        2. hdfs-3477.txt
          11 kB
          Todd Lipcon
        3. HDFS-3477.3.patch
          9 kB
          Rakesh Radhakrishnan
        4. HDFS-3477.3.patch
          9 kB
          Rakesh Radhakrishnan
        5. HDFS-3477.2.patch
          7 kB
          Rakesh Radhakrishnan
        6. HDFS-3477.1.patch
          11 kB
          Rakesh Radhakrishnan
        7. HDFS-3477.patch
          4 kB
          Rakesh Radhakrishnan

        Issue Links

          Activity

            People

              rakeshr Rakesh Radhakrishnan
              suja suja s
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: