Uploaded image for project: 'Apache Trafodion (Retired)'
  1. Apache Trafodion (Retired)
  2. TRAFODION-1897

dcscheck may fail if one of the nodes in zookeeper quorum is down

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • any
    • None
    • connectivity-dcs
    • None

    Description

      Reported by Joshua Liu - thanks for finding this defect
      ===========================================

      These days during HA testing, when one zookeeper node is down, then dcscheck may also gave one error like:

      Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /trafodion/dcs/master
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
      at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468)
      at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1496)
      at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:725)
      at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:593)
      at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:365)
      at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323)
      at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282)

      my env:
      1. Trafodion nodes centosha-[3-6]
      2. Zookeeper nodes is centosha-2, centosha-5, centosha-6
      3. If I down node centosha-6, then dcscheck would give the error. But if I down centosha-5, then we can’t see the error

      After check the codes, we found
      echo "ls $dcsznode"|$DCS_INSTALL_DIR/bin/dcs zkcli > $dcstmp

      every time when I manually ran dcs zkcli, it tried to connect to the zookeeper on node centosha-6. Even this node is down, the ‘dcs zkcli’ also try to connect this node:
      [trafodion@centosha-3 bin]$ dcs zkcli
      Connecting to centosha-6.novalocal:2181
      Welcome to ZooKeeper!
      JLine support is enabled
      [zk: centosha-6.novalocal:2181(CONNECTING) 0] ls /
      Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
      at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468)
      at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1496)
      at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:725)
      at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:593)
      at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:365)
      at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323)
      at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282)

      Attachments

        Issue Links

          Activity

            People

              arvind-narain Arvind Narain
              arvind-narain Arvind Narain
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: