[ZOOKEEPER-1549] Data inconsistency when follower is receiving a DIFF with a dirty snapshot - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.4.3
Fix Version/s: None
Component/s: quorum
Labels:
None

Description

the trunc code (from ~~ZOOKEEPER-1154~~?) cannot work correct if the snapshot is not correct.
here is scenario(similar to 1154):
Initial Condition
1. Lets say there are three nodes in the ensemble A,B,C with A being the leader
2. The current epoch is 7.
3. For simplicity of the example, lets say zxid is a two digit number, with epoch being the first digit.
4. The zxid is 73
5. All the nodes have seen the change 73 and have persistently logged it.
Step 1
Request with zxid 74 is issued. The leader A writes it to the log but there is a crash of the entire ensemble and B,C never write the change 74 to their log.
Step 2
A,B restart, A is elected as the new leader, and A will load data and take a clean snapshot(change 74 is in it), then send diff to B, but B died before sync with A. A died later.
Step 3
B,C restart, A is still down
B,C form the quorum
B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73
epoch is now 8, zxid is 80
Request with zxid 81 is successful. On B, minCommitLog is now 71, maxCommitLog is 81
Step 4
A starts up. It applies the change in request with zxid 74 to its in-memory data tree
A contacts B to registerAsFollower and provides 74 as its ZxId
Since 71<=74<=81, B decides to send A the diff.
Problem:
The problem with the above sequence is that after truncate the log, A will load the snapshot again which is not correct.

In 3.3 branch, FileTxnSnapLog.restore does not call listener(~~ZOOKEEPER-874~~), the leader will send a snapshot to follower, it will not be a problem.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

case.patch
13/Sep/12 08:12
6 kB
Jacky007
ZOOKEEPER-1549-learner.patch
06/Jan/13 23:27
8 kB
Flavio Paiva Junqueira
ZOOKEEPER-1549-3.4.patch
30/Jan/13 05:56
134 kB
Thawan Kooburat

Issue Links

is related to

ZOOKEEPER-107 Allow dynamic changes to server cluster membership

Resolved

Sub-Tasks

1.	Leader should not snapshot uncommitted state	Closed	Flavio Paiva Junqueira
2.	Learner should not snapshot uncommitted state	Open	Hongchao Deng
3.	Change TRUNC to SNAP in sync phase for safety guarantee	Open	Unassigned

Activity

People

Assignee:: Flavio Paiva Junqueira

Reporter:: Jacky007

Votes:: 2 Vote for this issue

Watchers:: 21 Start watching this issue

Dates

Created:: 10/Sep/12 07:58

Updated:: 03/Feb/22 08:50