Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.4.11, 3.4.12
-
None
-
None
-
None
Description
The truncate() function in FileTxnLog.java may fail to properly remove the uncommitted write. This happens when the follower that has the uncommitted writes tries to resync with the leader after a few epochs have past. The failure results in data inconsistency in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency.
Initially:
- Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader)
- Create 5 znodes with initial values as follow (key = value)
/testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4
To Reproduce:
- Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2 - Restart the node 0 and 1 (let them finish with resync)
- Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1 - Restart the node 0, 1 and 2 (let them finish with resync)
- Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 2 - Restart the node 0 and 2 (let them finish with resync)
- Diverge the node 2
a. Shutdown the node 0
b. Async setData to the node 2 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 2 - Restart the node 0 and 1 (let them finish with resync)
- Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 1 - Restart the node 0 and 2 (let them finish with resync)
- Restart the node 1 (let it finish with resync)
Reading each key from each node directly will give us the output:
/testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 2 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 1003 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004
Thus, the value of key /testDivergenceResync3 is inconsistent across nodes.
What seems to happen:
- At the step 7, setData (at zxid 0x400000001) writing value 1003 is committed on the node 2.
Log from the node 2:
...
2017-11-16 03:08:14,123 [myid:2] - DEBUG [ProcessThread(sid:2 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x2000117327c0000 type:setData cxid:0x4 zxid:0x400000001 txntype:5 reqpath:n/a
2017-11-16 03:08:14,124 [myid:2] - DEBUG [ProcessThread(sid:2 cport:-1)::Leader@787] - Proposing:: sessionid:0x2000117327c0000 type:setData cxid:0x4 zxid:0x400000001 txntype:5 reqpath:n/a
2017-11-16 03:08:14,124 [myid:2] - INFO [SyncThread:2:FileTxnLog@209] - Creating new log file: log.400000001
2017-11-16 03:08:14,188 [myid:2] - DEBUG [SyncThread:2:Leader@600] - Count for zxid: 0x400000001 is 1
2017-11-16 03:08:15,752 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@512] - Shutting down
2017-11-16 03:08:15,753 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 2 ]
... - At the step 10, the node 2 is restarted and supposed to be properly resync’ed with the node 0 which is the leader.
a. The node 0 sends TRUNC msg so that the node 2 can truncate the setData at zxid 0x400000001..
b. On the other hand, the node 2 tries to truncate log to get in sync with the leader 0x200000001. However, the node 2 failed to properly truncate the setData at zxid 0x400000001. So, even if resync was finished, the value 1003 is still remained intact on the node 2 while other nodes have value 3 for the same key.
c. It seems on the node 2, there is only log.100000001 and log.400000001 but no log.200000001. This seems to cause failing to delete log.400000001 during truncate(). It looks like we will be considering log.400000001 by the time returning from the init() of FileTxnLog.java so that we will never execute ‘itr.logFile.delete()’ for the log.400000001.
d. Then, after returning from the truncate(), loadDatabase() will be invoked and log.400000001 will be read and the setData at zxid 0x400000001 gets loaded into the in-memory data tree.Log from the node 2:...
2017-11-16 03:08:59,051 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Follower@65] - FOLLOWING - LEADER ELECTION TOOK - 215
2017-11-16 03:08:59,052 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:QuorumPeer$QuorumServer@184] - Resolved hostname: 127.0.0.1 to address: /127.0.0.1
2017-11-16 03:08:59,125 [myid:2] - WARN [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Learner@349] - Truncating log to get in sync with the leader 0x200000001
2017-11-16 03:08:59,125 [myid:2] - DEBUG [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] - Created new input stream /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
2017-11-16 03:08:59,125 [myid:2] - DEBUG [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] - Created new input archive /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
2017-11-16 03:08:59,126 [myid:2] - DEBUG [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@647] - EOF excepton java.io.EOFException
2017-11-16 03:08:59,126 [myid:2] - DEBUG [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] - Created new input stream /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
2017-11-16 03:08:59,126 [myid:2] - DEBUG [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] - Created new input archive /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
2017-11-16 03:08:59,126 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileSnap@83] - Reading snapshot /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/snapshot.200000001
2017-11-16 03:08:59,127 [myid:2] - DEBUG [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] - Created new input stream /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
2017-11-16 03:08:59,127 [myid:2] - DEBUG [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] - Created new input archive /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
2017-11-16 03:08:59,128 [myid:2] - DEBUG [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@647] - EOF excepton java.io.EOFException
2017-11-16 03:08:59,128 [myid:2] - DEBUG [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] - Created new input stream /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
2017-11-16 03:08:59,128 [myid:2] - DEBUG [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] - Created new input archive /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
2017-11-16 03:08:59,128 [myid:2] - DEBUG [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@647] - EOF excepton java.io.EOFException
2017-11-16 03:08:59,131 [myid:2] - WARN [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Learner@387] - Got zxid 0x500000001 expected 0x1
2017-11-16 03:08:59,132 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnSnapLog@248] - Snapshotting: 0x500000004 to /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/snapshot.500000004
...Log from the node 0:...
2017-11-16 03:08:59,050 [myid:0] - INFO [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Leader@372] - LEADING - LEADER ELECTION TOOK - 222
2017-11-16 03:08:59,055 [myid:0] - INFO [LearnerHandler-/127.0.0.1:54482:LearnerHandler@346] - Follower sid: 2 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@266e422
2017-11-16 03:08:59,124 [myid:0] - INFO [LearnerHandler-/127.0.0.1:54482:LearnerHandler@401] - Synchronizing with Follower sid: 2 maxCommittedLog=0x500000004 minCommittedLog=0x100000001 peerLastZxid=0x400000001
2017-11-16 03:08:59,124 [myid:0] - DEBUG [LearnerHandler-/127.0.0.1:54482:LearnerHandler@415] - proposal size is 14
2017-11-16 03:08:59,124 [myid:0] - DEBUG [LearnerHandler-/127.0.0.1:54482:LearnerHandler@418] - Sending proposals to follower
2017-11-16 03:08:59,124 [myid:0] - INFO [LearnerHandler-/127.0.0.1:54482:LearnerHandler@475] - Sending TRUNC
2017-11-16 03:08:59,147 [myid:0] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@215] - Accepted socket connection from /127.0.0.1:55118
2017-11-16 03:08:59,184 [myid:0] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@383] - Exception causing close of session 0x0: ZooKeeperServer not running
2017-11-16 03:08:59,184 [myid:0] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@386] - IOException stack trace
java.io.IOException: ZooKeeperServer not running
at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:977)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:257)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:226)
at java.lang.Thread.run(Thread.java:745)
2017-11-16 03:08:59,184 [myid:0] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@1040] - Closed socket connection for client /127.0.0.1:55118 (no session established for client)
2017-11-16 03:08:59,224 [myid:0] - INFO [LearnerHandler-/127.0.0.1:54482:LearnerHandler@535] - Received NEWLEADER-ACK message from 2
2017-11-16 03:08:59,224 [myid:0] - INFO [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Leader@962] - Have quorum of supporters, sids: [ 0,2 ]; starting up and setting last processed zxid: 0x600000000
...