ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-1934

Stale data received from sync'd ensemble peer

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 3.5.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      In our regression testing we encountered an error wherein we were caching a value we read from zookeeper and then experienced session loss. We subsequently got reconnected to a different zookeeper server. When we tried to read the same path from this new zookeeper server we are getting a stale value.

      Specifically, we are reading "/binchanges" and originally got back a value of "3" from the first server. After we lost connection and reconnected before the session timeout, we then read "/binchanges" from the new server and got back a value of "2". In our code path we never set this value from 3 to 2. We throw an assertion if the value ever goes backwards. Which is how we caught this error.

      It's my understanding of the single system image guarantee that this should never be allowed. I realize that the single system image guarantee is still quorum based and it's certainly possible that a minority of the ensemble may have stale data. However, I also believe that each client has to send the highest zxid it's seen as part of its connection request to the server. And if the server it's connecting to has a smaller zxid than the value the client sends, then the connection request should be refused.

      Assuming I have all of that correct, then I'm at a loss for how this happened.

      The failure happened around Jun 4 08:13:44. Just before that, at June 4 08:13:30 there was a round of leader election. During that round of leader election we voted server with id=4 and zxid=0x300001c4c. This then led to a new zxid=0x400000001. The new leader sends a diff to all the servers including the one we will soon read the stale data from (id=2). Server with ID=2's log files also reflect that as of 08:13:43 it was up to date and current with an UPTODATE message.

      I'm going to attach log files from all 5 ensemble nodes. I also used zktreeutil to dump the database out for the 5 ensemble nodes. I diff'd those, and compared them all for correctness. 1 of the nodes (id=2) has a massively divergent zktreeutil dump than the other 4 nodes even though it received the diff from the new leader.

      In the attachments there are 5 nodes. I will number each log file by it's zookeeper id, e.g. node4.log.

      1. node1.log
        1.01 MB
        Marshall McMullen
      2. node2.log
        312 kB
        Marshall McMullen
      3. node3.log
        986 kB
        Marshall McMullen
      4. node4.log
        6.86 MB
        Marshall McMullen
      5. node5.log
        3.48 MB
        Marshall McMullen

        Activity

        Hide
        Marshall McMullen added a comment -

        Log files from all 5 ensemble nodes.

        Show
        Marshall McMullen added a comment - Log files from all 5 ensemble nodes.
        Hide
        Marshall McMullen added a comment -

        Diffing the zktreeutil dumps of each server is also interesting. There are a few minor differences with local sessions:

        diff -a node1.zktree node3.zktree
        8933,8934d8932
        < |--[144115323715452941]
        < |
        9162,9163d9159
        < |
        < |--[72058779056865292]

        diff -a node1.zktree node4.zktree
        8933,8934d8932
        < |--[144115323715452941]
        < |
        9005,9006d9002
        < |--[216173168961912851]
        < |
        9162,9163d9157
        < |
        < |--[72058779056865292]

        diff -a node1.zktree node5.zktree
        8933,8934d8932
        < |--[144115323715452941]
        < |
        9005,9006d9002
        < |--[216173168961912851]
        < |
        9065,9066d9060
        < |--[288230547757793293]
        < |
        9162,9163d9155
        < |
        < |--[72058779056865292]

        Whereas node2 is MASSIVELY different.

        In particular, the /binchanges value is different:

        --[binchanges] --[binchanges]
             
          --[version => 3]     --[version => 2]
        Show
        Marshall McMullen added a comment - Diffing the zktreeutil dumps of each server is also interesting. There are a few minor differences with local sessions: diff -a node1.zktree node3.zktree 8933,8934d8932 < |-- [144115323715452941] < | 9162,9163d9159 < | < |-- [72058779056865292] diff -a node1.zktree node4.zktree 8933,8934d8932 < |-- [144115323715452941] < | 9005,9006d9002 < |-- [216173168961912851] < | 9162,9163d9157 < | < |-- [72058779056865292] diff -a node1.zktree node5.zktree 8933,8934d8932 < |-- [144115323715452941] < | 9005,9006d9002 < |-- [216173168961912851] < | 9065,9066d9060 < |-- [288230547757793293] < | 9162,9163d9155 < | < |-- [72058779056865292] Whereas node2 is MASSIVELY different. In particular, the /binchanges value is different: -- [binchanges] -- [binchanges]         -- [version => 3]     -- [version => 2]
        Hide
        Marshall McMullen added a comment -

        Yet before we grabbed this data, the offending node (nodeid=2, myid=2) stated this:

        1723 Jun 4 08:13:30 zookeeper - INFO [QuorumPeer[myid=2]/10.26.65.47:2181:ZooKeeperServer@156] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /sf/data/zoo
        1724 Jun 4 08:13:30 zookeeper - INFO [QuorumPeer[myid=2]/10.26.65.47:2181:Follower@66] - FOLLOWING - LEADER ELECTION TOOK - -1401867542249
        1725 Jun 4 08:13:30 zookeeper - WARN [QuorumPeer[myid=2]/10.26.65.47:2181:Learner@240] - Unexpected exception, tries=0, connecting to /10.26.65.103:2182
        1726 Jun 4 08:13:30 localhost at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:232)
        1727 Jun 4 08:13:30 localhost at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:74)
        1728 Jun 4 08:13:30 localhost at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:967)
        1729 Jun 4 08:13:30 zookeeper - INFO [NIOServerCxnFactory.AcceptThread:/10.26.65.47:2181:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /10.26.65.103:35987
        1730 Jun 4 08:13:30 zookeeper - WARN [NIOWorkerThread-37:NIOServerCnxn@365] - Exception causing close of session 0x0: ZooKeeperServer not running
        1731 Jun 4 08:13:30 zookeeper - INFO [NIOWorkerThread-37:NIOServerCnxn@999] - Closed socket connection for client /10.26.65.103:35987 (no session established for client)
        1732 Jun 4 08:13:31 zookeeper - INFO [NIOServerCxnFactory.AcceptThread:/10.26.65.47:2181:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /10.26.65.103:59764
        1733 Jun 4 08:13:31 zookeeper - WARN [NIOWorkerThread-38:NIOServerCnxn@365] - Exception causing close of session 0x0: ZooKeeperServer not running
        1734 Jun 4 08:13:31 zookeeper - INFO [NIOWorkerThread-38:NIOServerCnxn@999] - Closed socket connection for client /10.26.65.103:59764 (no session established for client)
        1735 Jun 4 08:13:31 zookeeper - INFO [NIOServerCxnFactory.AcceptThread:/10.26.65.47:2181:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /10.26.65.103:51005
        1736 Jun 4 08:13:31 zookeeper - WARN [NIOWorkerThread-39:NIOServerCnxn@365] - Exception causing close of session 0x0: ZooKeeperServer not running
        1737 Jun 4 08:13:31 zookeeper - INFO [NIOWorkerThread-39:NIOServerCnxn@999] - Closed socket connection for client /10.26.65.103:51005 (no session established for client)
        1738 Jun 4 08:13:31 zookeeper - INFO [NIOServerCxnFactory.AcceptThread:/10.26.65.47:2181:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /10.26.65.3:39628
        1739 Jun 4 08:13:31 zookeeper - WARN [NIOWorkerThread-40:NIOServerCnxn@365] - Exception causing close of session 0x0: ZooKeeperServer not running
        1740 Jun 4 08:13:31 zookeeper - INFO [NIOWorkerThread-40:NIOServerCnxn@999] - Closed socket connection for client /10.26.65.3:39628 (no session established for client)
        1741 Jun 4 08:13:31 zookeeper - INFO [NIOServerCxnFactory.AcceptThread:/10.26.65.47:2181:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /10.26.65.3:47705
        1742 Jun 4 08:13:31 zookeeper - WARN [NIOWorkerThread-41:NIOServerCnxn@365] - Exception causing close of session 0x0: ZooKeeperServer not running
        1743 Jun 4 08:13:31 zookeeper - INFO [NIOWorkerThread-41:NIOServerCnxn@999] - Closed socket connection for client /10.26.65.3:47705 (no session established for client)
        1744 Jun 4 08:13:31 zookeeper - INFO [NIOServerCxnFactory.AcceptThread:/10.26.65.47:2181:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /10.26.65.3:34353
        1745 Jun 4 08:13:31 zookeeper - WARN [NIOWorkerThread-42:NIOServerCnxn@365] - Exception causing close of session 0x0: ZooKeeperServer not running
        1746 Jun 4 08:13:31 zookeeper - INFO [NIOWorkerThread-42:NIOServerCnxn@999] - Closed socket connection for client /10.26.65.3:34353 (no session established for client)
        1747 Jun 4 08:13:31 zookeeper - INFO [QuorumPeer[myid=2]/10.26.65.47:2181:Learner@332] - Getting a diff from the leader 0x300001c4c
        1748 Jun 4 08:13:31 zookeeper - INFO [QuorumPeer[myid=2]/10.26.65.47:2181:Learner@475] - Learner received NEWLEADER message
        1749 Jun 4 08:13:31 zookeeper - WARN [QuorumPeer[myid=2]/10.26.65.47:2181:QuorumPeer@1271] - setLastSeenQuorumVerifier called with stale config 4294967296. Current version: 4294967296
        1750 Jun 4 08:13:31 zookeeper - INFO [QuorumPeer[myid=2]/10.26.65.47:2181:FileTxnSnapLog@297] - Snapshotting: 0x300001c4c to /sf/data/zookeeper/10.26.65.47/version-2/snapshot.300001c4c
        1751 Jun 4 08:13:31 zookeeper - INFO [QuorumPeer[myid=2]/10.26.65.47:2181:Learner@460] - Learner received UPTODATE message

        Show
        Marshall McMullen added a comment - Yet before we grabbed this data, the offending node (nodeid=2, myid=2) stated this: 1723 Jun 4 08:13:30 zookeeper - INFO [QuorumPeer [myid=2] /10.26.65.47:2181:ZooKeeperServer@156] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /sf/data/zoo 1724 Jun 4 08:13:30 zookeeper - INFO [QuorumPeer [myid=2] /10.26.65.47:2181:Follower@66] - FOLLOWING - LEADER ELECTION TOOK - -1401867542249 1725 Jun 4 08:13:30 zookeeper - WARN [QuorumPeer [myid=2] /10.26.65.47:2181:Learner@240] - Unexpected exception, tries=0, connecting to /10.26.65.103:2182 1726 Jun 4 08:13:30 localhost at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:232) 1727 Jun 4 08:13:30 localhost at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:74) 1728 Jun 4 08:13:30 localhost at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:967) 1729 Jun 4 08:13:30 zookeeper - INFO [NIOServerCxnFactory.AcceptThread:/10.26.65.47:2181:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /10.26.65.103:35987 1730 Jun 4 08:13:30 zookeeper - WARN [NIOWorkerThread-37:NIOServerCnxn@365] - Exception causing close of session 0x0: ZooKeeperServer not running 1731 Jun 4 08:13:30 zookeeper - INFO [NIOWorkerThread-37:NIOServerCnxn@999] - Closed socket connection for client /10.26.65.103:35987 (no session established for client) 1732 Jun 4 08:13:31 zookeeper - INFO [NIOServerCxnFactory.AcceptThread:/10.26.65.47:2181:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /10.26.65.103:59764 1733 Jun 4 08:13:31 zookeeper - WARN [NIOWorkerThread-38:NIOServerCnxn@365] - Exception causing close of session 0x0: ZooKeeperServer not running 1734 Jun 4 08:13:31 zookeeper - INFO [NIOWorkerThread-38:NIOServerCnxn@999] - Closed socket connection for client /10.26.65.103:59764 (no session established for client) 1735 Jun 4 08:13:31 zookeeper - INFO [NIOServerCxnFactory.AcceptThread:/10.26.65.47:2181:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /10.26.65.103:51005 1736 Jun 4 08:13:31 zookeeper - WARN [NIOWorkerThread-39:NIOServerCnxn@365] - Exception causing close of session 0x0: ZooKeeperServer not running 1737 Jun 4 08:13:31 zookeeper - INFO [NIOWorkerThread-39:NIOServerCnxn@999] - Closed socket connection for client /10.26.65.103:51005 (no session established for client) 1738 Jun 4 08:13:31 zookeeper - INFO [NIOServerCxnFactory.AcceptThread:/10.26.65.47:2181:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /10.26.65.3:39628 1739 Jun 4 08:13:31 zookeeper - WARN [NIOWorkerThread-40:NIOServerCnxn@365] - Exception causing close of session 0x0: ZooKeeperServer not running 1740 Jun 4 08:13:31 zookeeper - INFO [NIOWorkerThread-40:NIOServerCnxn@999] - Closed socket connection for client /10.26.65.3:39628 (no session established for client) 1741 Jun 4 08:13:31 zookeeper - INFO [NIOServerCxnFactory.AcceptThread:/10.26.65.47:2181:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /10.26.65.3:47705 1742 Jun 4 08:13:31 zookeeper - WARN [NIOWorkerThread-41:NIOServerCnxn@365] - Exception causing close of session 0x0: ZooKeeperServer not running 1743 Jun 4 08:13:31 zookeeper - INFO [NIOWorkerThread-41:NIOServerCnxn@999] - Closed socket connection for client /10.26.65.3:47705 (no session established for client) 1744 Jun 4 08:13:31 zookeeper - INFO [NIOServerCxnFactory.AcceptThread:/10.26.65.47:2181:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /10.26.65.3:34353 1745 Jun 4 08:13:31 zookeeper - WARN [NIOWorkerThread-42:NIOServerCnxn@365] - Exception causing close of session 0x0: ZooKeeperServer not running 1746 Jun 4 08:13:31 zookeeper - INFO [NIOWorkerThread-42:NIOServerCnxn@999] - Closed socket connection for client /10.26.65.3:34353 (no session established for client) 1747 Jun 4 08:13:31 zookeeper - INFO [QuorumPeer [myid=2] /10.26.65.47:2181:Learner@332] - Getting a diff from the leader 0x300001c4c 1748 Jun 4 08:13:31 zookeeper - INFO [QuorumPeer [myid=2] /10.26.65.47:2181:Learner@475] - Learner received NEWLEADER message 1749 Jun 4 08:13:31 zookeeper - WARN [QuorumPeer [myid=2] /10.26.65.47:2181:QuorumPeer@1271] - setLastSeenQuorumVerifier called with stale config 4294967296. Current version: 4294967296 1750 Jun 4 08:13:31 zookeeper - INFO [QuorumPeer [myid=2] /10.26.65.47:2181:FileTxnSnapLog@297] - Snapshotting: 0x300001c4c to /sf/data/zookeeper/10.26.65.47/version-2/snapshot.300001c4c 1751 Jun 4 08:13:31 zookeeper - INFO [QuorumPeer [myid=2] /10.26.65.47:2181:Learner@460] - Learner received UPTODATE message
        Hide
        Michi Mutsuzaki added a comment -

        Hi Marshall,

        Yes, the server is supposed to refuse the client connection if the client has seen a higher zxid than the server.

        https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L825

        Does /binchanges on node2 eventually become 3, or is it stuck at 2?

        Show
        Michi Mutsuzaki added a comment - Hi Marshall, Yes, the server is supposed to refuse the client connection if the client has seen a higher zxid than the server. https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L825 Does /binchanges on node2 eventually become 3, or is it stuck at 2?
        Hide
        Marshall McMullen added a comment -

        Michi Mutsuzaki - thanks for looking at this issue. I saw the same code you linked to and agree on the intended behavior. The log message in that block of code is NOT present.

        We did not see /binchanges update to the correct value of 3. It looked to be stuck at 2. Which really defies explanation.

        Show
        Marshall McMullen added a comment - Michi Mutsuzaki - thanks for looking at this issue. I saw the same code you linked to and agree on the intended behavior. The log message in that block of code is NOT present. We did not see /binchanges update to the correct value of 3. It looked to be stuck at 2. Which really defies explanation.
        Hide
        Raul Gutierrez Segales added a comment -

        Well, but you are using local sessions right? Which means that upon a ConnectionLoss and moving to a new server you will get a SessionExpired (if your session was local), right? After your session expires, your new client won't know anything about your last highest seen zxid?

        Show
        Raul Gutierrez Segales added a comment - Well, but you are using local sessions right? Which means that upon a ConnectionLoss and moving to a new server you will get a SessionExpired (if your session was local), right? After your session expires, your new client won't know anything about your last highest seen zxid?
        Hide
        Marshall McMullen added a comment -

        Raul Gutierrez Segales - No, we are not using local sessions.

        Show
        Marshall McMullen added a comment - Raul Gutierrez Segales - No, we are not using local sessions.
        Hide
        Jared Cantwell added a comment -

        I am working with Marshall on this issue. Upon investigating more, I noticed that all of our processes that had problems had just re-established a connection to node2 immediately before getting stale state from that ZK (through node2). This seems to match with Marshall's observation that dumping state with zktreeutil on node2 shows a state that is out of date. It was even stale some time afterward.

        So right now I am investigating how node2 could think its local database was at zxid N+1 (hence why it accepted the connection from the client), but somehow the state of the database is reflecting zxid <= N. Leader election happened about 15 seconds prior to this, but connections were just getting session connected events. I'm still digging through the leader election logs, but nothing stands out as suspicious to me yet.

        Any help or ideas for things to look for would be much appreciated.

        Show
        Jared Cantwell added a comment - I am working with Marshall on this issue. Upon investigating more, I noticed that all of our processes that had problems had just re-established a connection to node2 immediately before getting stale state from that ZK (through node2). This seems to match with Marshall's observation that dumping state with zktreeutil on node2 shows a state that is out of date. It was even stale some time afterward. So right now I am investigating how node2 could think its local database was at zxid N+1 (hence why it accepted the connection from the client), but somehow the state of the database is reflecting zxid <= N. Leader election happened about 15 seconds prior to this, but connections were just getting session connected events. I'm still digging through the leader election logs, but nothing stands out as suspicious to me yet. Any help or ideas for things to look for would be much appreciated.

          People

          • Assignee:
            Unassigned
            Reporter:
            Marshall McMullen
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development