Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
1.0.0, 1.0.1
-
None
Description
When zookeeper reconnect happens, nimbus registry can be deleted though nimbus is alive.
Below is zookeeper node for nimbus registry.
get /storm/nimbuses/<host>:6627 ?f`d``??????M?-?-.?/??5??/H?+.IL???ON??``b`?|???^^??????? ?'h?g?g?g?g t-?,[??Q cZxid = 0x4000005ae ctime = Fri Jul 01 11:43:51 UTC 2016 mZxid = 0x4000005ae mtime = Fri Jul 01 11:43:51 UTC 2016 pZxid = 0x4000005ae cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x255a62e310c0005 dataLength = 98 numChildren = 0
get /storm/nimbuses/<host>:6627 ?f`d``??????M?-?-.?/??5??/H?+.IL???ON??``b`?|???^^??????? ?'h?g?g?g?g t-?,[??Q cZxid = 0x4000005ae ctime = Fri Jul 01 11:43:51 UTC 2016 mZxid = 0x50000000e mtime = Fri Jul 01 11:46:08 UTC 2016 pZxid = 0x4000005ae cversion = 0 dataVersion = 1 aclVersion = 0 ephemeralOwner = 0x255a62e310c0005 dataLength = 98 numChildren = 0
Below is transaction log for that node.
7/1/16 11:43:51 AM UTC session 0x255a62e310c0005 cxid 0xd zxid 0x4000005ae create '/storm/nimbuses/<host>:6627,#1fffffff8b80000000ffffffe36660646060ffffff90ffffffcfffffffcaffffffc9ffffffccffffffd54dffffffcc2dffffffd62d2effffffc92fffffffcaffffffd535ffffffd2ffffffcb2f48ffffffcd2b2e494cffffffceffffffceffffffc94f4effffffccffffffe160606260ffffff907cffffffccffffffc1ffffffc01c5e165effffffceffffffc4ffffffc0ffffffc2ffffffc0ffffffcdffffffc0affffffd42768ffffffa867ffffffa067ffffffa867ffffffa467affffffa4d742dffffff8c2c1805b14ffffffc2ffffffaf51000,v{s{31,s{'world,'anyone}}},T,10
7/1/16 11:46:08 AM UTC session 0x355a647bd8c0000 cxid 0x3 zxid 0x50000000e setData '/storm/nimbuses/<host>:6627,#1fffffff8b80000000ffffffe36660646060ffffff90ffffffcfffffffcaffffffc9ffffffccffffffd54dffffffcc2dffffffd62d2effffffc92fffffffcaffffffd535ffffffd2ffffffcb2f48ffffffcd2b2e494cffffffceffffffceffffffc94f4effffffccffffffe160606260ffffff907cffffffccffffffc1ffffffc01c5e165effffffceffffffc4ffffffc0ffffffc2ffffffc0ffffffcdffffffc0affffffd42768ffffffa867ffffffa067ffffffa867ffffffa467affffffa4d742dffffff8c2c1805b14ffffffc2ffffffaf51000,1
Please take a look at ctime, mtime, and ephemeralOwner.
Ephemeral owner session was already closed from nimbus side but there's possible for node to be not deleted immediately, so new session doesn't create new node but set the value to ephemeral node for other session which is already closed.
And eventually that node is deleted although session 0x355a647bd8c0000 is alive.
2016-07-01 11:45:05.675 o.a.s.s.o.a.z.ClientCnxn [DEBUG] Disconnecting client for session: 0x255a62e310c0005
2016-07-01 11:45:05.675 o.a.s.s.o.a.z.ZooKeeper [INFO] Session: 0x255a62e310c0005 closed
We can delete the node first and set ephemeral node when reconnect event handler is called.