ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-1496

Ephemeral node not getting cleared even after client has exited

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 3.4.3
    • Fix Version/s: 3.4.4, 3.5.0
    • Component/s: server
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.

      Zk version: 3.4.3

      Ephemeral node still exists in Zookeeper:

      HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date

      Tue Jun 26 16:07:04 IST 2012
      HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182
      Connecting to xx.xx.xx.55:2182
      Welcome to ZooKeeper!
      JLine support is enabled
      [zk: xx.xx.xx.55:2182(CONNECTING) 0]
      WATCHER::

      WatchedEvent state:SyncConnected type:None path:null

      [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock

      haclusternn2HOSt-xx-xx-xx-102 ��
      cZxid = 0x200000075
      ctime = Tue Jun 26 13:10:19 IST 2012
      mZxid = 0x200000075
      mtime = Tue Jun 26 13:10:19 IST 2012
      pZxid = 0x200000075
      cversion = 0
      dataVersion = 0
      aclVersion = 0
      ephemeralOwner = 0x1382791d4e50004
      dataLength = 42
      numChildren = 0
      [zk: xx.xx.xx.55:2182(CONNECTED) 1]

      Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed.

      HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074
      2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a
      2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a
      2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a
      2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a

      HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075
      2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a
      2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a
      2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a
      2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a

      Close session and create requests coming almost parallely.

      Env:
      Hadoop setup.
      We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
      NN102 was active and NN55 was standby.
      FailoverController at 102 got shut down due to ZK connection error.
      The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

      1. ZOOKEEPER-1496.patch
        0.7 kB
        Rakesh R
      2. ZOOKEEPER-1496.3.patch
        8 kB
        Rakesh R
      3. ZOOKEEPER-1496.2.patch
        7 kB
        Rakesh R
      4. ZOOKEEPER-1496.1.patch
        2 kB
        Rakesh R
      5. Logs.rar
        1.51 MB
        suja s

        Activity

        suja s created issue -
        suja s made changes -
        Field Original Value New Value
        Attachment Logs.rar [ 12533800 ]
        suja s made changes -
        Description In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.

        Zk version: 3.4.3

        Ephemeral node still exists in Zookeeper:

        HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date

        Tue Jun 26 16:07:04 IST 2012
        HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182
        Connecting to xx.xx.xx.55:2182
        Welcome to ZooKeeper!
        JLine support is enabled
        [zk: xx.xx.xx.55:2182(CONNECTING) 0]
        WATCHER::

        WatchedEvent state:SyncConnected type:None path:null

        [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock

        haclusternn2HOST-10-18-40-102 ��
        cZxid = 0x200000075
        ctime = Tue Jun 26 13:10:19 IST 2012
        mZxid = 0x200000075
        mtime = Tue Jun 26 13:10:19 IST 2012
        pZxid = 0x200000075
        cversion = 0
        dataVersion = 0
        aclVersion = 0
        ephemeralOwner = 0x1382791d4e50004
        dataLength = 42
        numChildren = 0
        [zk: xx.xx.xx.55:2182(CONNECTED) 1]

        Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed.

        HOST-10-18-40-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074
        2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a
        2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a
        2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/10.18.40.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a
        2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a

        HOST-10-18-40-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075
        2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a
        2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a
        2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/10.18.40.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a
        2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a


         Close session and create requests coming almost parallely.


        Env:
        Hadoop setup.
        We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
        NN102 was active and NN55 was standby.
        FailoverController at 102 got shut down due to ZK connection error.
        The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK
        In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.

        Zk version: 3.4.3

        Ephemeral node still exists in Zookeeper:

        HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date

        Tue Jun 26 16:07:04 IST 2012
        HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182
        Connecting to xx.xx.xx.55:2182
        Welcome to ZooKeeper!
        JLine support is enabled
        [zk: xx.xx.xx.55:2182(CONNECTING) 0]
        WATCHER::

        WatchedEvent state:SyncConnected type:None path:null

        [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock

        haclusternn2HOSt-xx-xx-xx-102 ��
        cZxid = 0x200000075
        ctime = Tue Jun 26 13:10:19 IST 2012
        mZxid = 0x200000075
        mtime = Tue Jun 26 13:10:19 IST 2012
        pZxid = 0x200000075
        cversion = 0
        dataVersion = 0
        aclVersion = 0
        ephemeralOwner = 0x1382791d4e50004
        dataLength = 42
        numChildren = 0
        [zk: xx.xx.xx.55:2182(CONNECTED) 1]

        Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed.

        HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074
        2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a
        2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a
        2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a
        2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a

        HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075
        2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a
        2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a
        2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a
        2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a


         Close session and create requests coming almost parallely.


        Env:
        Hadoop setup.
        We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
        NN102 was active and NN55 was standby.
        FailoverController at 102 got shut down due to ZK connection error.
        The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK
        Rakesh R made changes -
        Attachment ZOOKEEPER-1496.patch [ 12533949 ]
        Camille Fournier made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Rakesh R made changes -
        Attachment ZOOKEEPER-1496.1.patch [ 12533973 ]
        Rakesh R made changes -
        Assignee Rakesh R [ rakeshr ]
        Rakesh R made changes -
        Attachment ZOOKEEPER-1496.2.patch [ 12536199 ]
        Rakesh R made changes -
        Affects Version/s 3.4.3 [ 12319288 ]
        Mahadev konar made changes -
        Fix Version/s 3.4.4 [ 12319841 ]
        Mahadev konar made changes -
        Priority Major [ 3 ] Critical [ 2 ]
        Mahadev konar made changes -
        Component/s server [ 12312382 ]
        Mahadev konar made changes -
        Fix Version/s 3.5.0 [ 12316644 ]
        Rakesh R made changes -
        Attachment ZOOKEEPER-1496.3.patch [ 12545370 ]
        Rakesh R made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Rakesh R made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Mahadev konar made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Resolution Fixed [ 1 ]

          People

          • Assignee:
            Rakesh R
            Reporter:
            suja s
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development